Introduction

Reduction of child mortality is reflected in several of the United Nations’ Sustainable Development Goals and is a key indicator of human progress. The UN expects that by 2030, countries end preventable deaths of newborns and children under 5 years of age, with all countries aiming to reduce under‑5 mortality to at least as low as 25 per 1,000 live births. Parallel to notion of child mortality is of course maternal mortality, which accounts for 295 000 deaths during and following pregnancy and childbirth (as of 2017). The vast majority of these deaths (94%) occurred in low-resource settings, and most could have been prevented. In light of what was mentioned above, Cardiotocograms (CTGs) are a simple and cost accessible option to assess fetal health, allowing healthcare professionals to take action in order to prevent child and maternal mortality. The equipment itself works by sending ultrasound pulses and reading its response, thus shedding light on fetal heart rate (FHR), fetal movements, uterine contractions and more.

Goal

The goal of this part of the homework is to apply all statistical tools studied in order to classify, in the best way possible (what is the “best” way possible will be discussed later on), a new fetus depending on various features. For this, we will use Quadratic Discriminant Analysis, Linear Discriminant Analysis, Naive Bayes and Logistic Regression.

Installing libraries

library("ggplot2")
library("MASS")
library("e1071") 
library("gridExtra")
library("ggcorrplot")
library("VGAM")
## Loading required package: stats4
## Loading required package: splines
library("rrcov")
## Loading required package: robustbase
## Scalable Robust Estimators with High Breakdown Point (version 1.5-5)
library("bestNormalize")
## 
## Attaching package: 'bestNormalize'
## The following object is masked from 'package:MASS':
## 
##     boxcox
library("outliers")
library("EnvStats")
## Registered S3 method overwritten by 'EnvStats':
##   method      from         
##   plot.boxcox bestNormalize
## 
## Attaching package: 'EnvStats'
## The following object is masked from 'package:bestNormalize':
## 
##     boxcox
## The following object is masked from 'package:rrcov':
## 
##     predict
## The following objects are masked from 'package:VGAM':
## 
##     calibrate, dpareto, ppareto, predict, qpareto, rpareto
## The following objects are masked from 'package:e1071':
## 
##     kurtosis, skewness
## The following object is masked from 'package:MASS':
## 
##     boxcox
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
## The following object is masked from 'package:base':
## 
##     print.default
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("plotrix")
library("caret")
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:VGAM':
## 
##     predictors
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble  3.0.3     ✓ purrr   0.3.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::combine() masks gridExtra::combine()
## x tidyr::fill()    masks VGAM::fill()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
## x purrr::lift()    masks caret::lift()
## x dplyr::select()  masks MASS::select()
library("penalizedLDA")
library("bnclassify")
## 
## Attaching package: 'bnclassify'
## The following object is masked from 'package:dplyr':
## 
##     vars
## The following object is masked from 'package:EnvStats':
## 
##     cv
## The following object is masked from 'package:ggplot2':
## 
##     vars
library("naivebayes")
## naivebayes 0.9.7 loaded
library("MLeval")
library("caTools")
library("factoextra")
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library("FactoMineR")
library("paran")

####Defining Colours

color_1 <- "blue"
color_2 <- "red"
color_3 <- "yellow"

Data Source

This dataset contains 2126 records of features extracted from Cardiotocogram exams, which were then classified by three expert obstetritians into 3 classes: Normal (0) Suspect (1) Pathological (2)

First, let´s read the data and study it.

data <- read.csv("fetal_health.csv")
dim(data)
## [1] 2126   22
head(data)
##   baseline.value accelerations fetal_movement uterine_contractions
## 1            120         0.000              0                0.000
## 2            132         0.006              0                0.006
## 3            133         0.003              0                0.008
## 4            134         0.003              0                0.008
## 5            132         0.007              0                0.008
## 6            134         0.001              0                0.010
##   light_decelerations severe_decelerations prolongued_decelerations
## 1               0.000                    0                    0.000
## 2               0.003                    0                    0.000
## 3               0.003                    0                    0.000
## 4               0.003                    0                    0.000
## 5               0.000                    0                    0.000
## 6               0.009                    0                    0.002
##   abnormal_short_term_variability mean_value_of_short_term_variability
## 1                              73                                  0.5
## 2                              17                                  2.1
## 3                              16                                  2.1
## 4                              16                                  2.4
## 5                              16                                  2.4
## 6                              26                                  5.9
##   percentage_of_time_with_abnormal_long_term_variability
## 1                                                     43
## 2                                                      0
## 3                                                      0
## 4                                                      0
## 5                                                      0
## 6                                                      0
##   mean_value_of_long_term_variability histogram_width histogram_min
## 1                                 2.4              64            62
## 2                                10.4             130            68
## 3                                13.4             130            68
## 4                                23.0             117            53
## 5                                19.9             117            53
## 6                                 0.0             150            50
##   histogram_max histogram_number_of_peaks histogram_number_of_zeroes
## 1           126                         2                          0
## 2           198                         6                          1
## 3           198                         5                          1
## 4           170                        11                          0
## 5           170                         9                          0
## 6           200                         5                          3
##   histogram_mode histogram_mean histogram_median histogram_variance
## 1            120            137              121                 73
## 2            141            136              140                 12
## 3            141            135              138                 13
## 4            137            134              137                 13
## 5            137            136              138                 11
## 6             76            107              107                170
##   histogram_tendency fetal_health
## 1                  1            2
## 2                  0            1
## 3                  0            1
## 4                  1            1
## 5                  1            1
## 6                  0            3
summary(data)
##  baseline.value  accelerations      fetal_movement     uterine_contractions
##  Min.   :106.0   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000    
##  1st Qu.:126.0   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.002000    
##  Median :133.0   Median :0.002000   Median :0.000000   Median :0.004000    
##  Mean   :133.3   Mean   :0.003178   Mean   :0.009481   Mean   :0.004366    
##  3rd Qu.:140.0   3rd Qu.:0.006000   3rd Qu.:0.003000   3rd Qu.:0.007000    
##  Max.   :160.0   Max.   :0.019000   Max.   :0.481000   Max.   :0.015000    
##  light_decelerations severe_decelerations prolongued_decelerations
##  Min.   :0.000000    Min.   :0.000e+00    Min.   :0.0000000       
##  1st Qu.:0.000000    1st Qu.:0.000e+00    1st Qu.:0.0000000       
##  Median :0.000000    Median :0.000e+00    Median :0.0000000       
##  Mean   :0.001889    Mean   :3.293e-06    Mean   :0.0001585       
##  3rd Qu.:0.003000    3rd Qu.:0.000e+00    3rd Qu.:0.0000000       
##  Max.   :0.015000    Max.   :1.000e-03    Max.   :0.0050000       
##  abnormal_short_term_variability mean_value_of_short_term_variability
##  Min.   :12.00                   Min.   :0.200                       
##  1st Qu.:32.00                   1st Qu.:0.700                       
##  Median :49.00                   Median :1.200                       
##  Mean   :46.99                   Mean   :1.333                       
##  3rd Qu.:61.00                   3rd Qu.:1.700                       
##  Max.   :87.00                   Max.   :7.000                       
##  percentage_of_time_with_abnormal_long_term_variability
##  Min.   : 0.000                                        
##  1st Qu.: 0.000                                        
##  Median : 0.000                                        
##  Mean   : 9.847                                        
##  3rd Qu.:11.000                                        
##  Max.   :91.000                                        
##  mean_value_of_long_term_variability histogram_width  histogram_min   
##  Min.   : 0.000                      Min.   :  3.00   Min.   : 50.00  
##  1st Qu.: 4.600                      1st Qu.: 37.00   1st Qu.: 67.00  
##  Median : 7.400                      Median : 67.50   Median : 93.00  
##  Mean   : 8.188                      Mean   : 70.45   Mean   : 93.58  
##  3rd Qu.:10.800                      3rd Qu.:100.00   3rd Qu.:120.00  
##  Max.   :50.700                      Max.   :180.00   Max.   :159.00  
##  histogram_max histogram_number_of_peaks histogram_number_of_zeroes
##  Min.   :122   Min.   : 0.000            Min.   : 0.0000           
##  1st Qu.:152   1st Qu.: 2.000            1st Qu.: 0.0000           
##  Median :162   Median : 3.000            Median : 0.0000           
##  Mean   :164   Mean   : 4.068            Mean   : 0.3236           
##  3rd Qu.:174   3rd Qu.: 6.000            3rd Qu.: 0.0000           
##  Max.   :238   Max.   :18.000            Max.   :10.0000           
##  histogram_mode  histogram_mean  histogram_median histogram_variance
##  Min.   : 60.0   Min.   : 73.0   Min.   : 77.0    Min.   :  0.00    
##  1st Qu.:129.0   1st Qu.:125.0   1st Qu.:129.0    1st Qu.:  2.00    
##  Median :139.0   Median :136.0   Median :139.0    Median :  7.00    
##  Mean   :137.5   Mean   :134.6   Mean   :138.1    Mean   : 18.81    
##  3rd Qu.:148.0   3rd Qu.:145.0   3rd Qu.:148.0    3rd Qu.: 24.00    
##  Max.   :187.0   Max.   :182.0   Max.   :186.0    Max.   :269.00    
##  histogram_tendency  fetal_health  
##  Min.   :-1.0000    Min.   :1.000  
##  1st Qu.: 0.0000    1st Qu.:1.000  
##  Median : 0.0000    Median :1.000  
##  Mean   : 0.3203    Mean   :1.304  
##  3rd Qu.: 1.0000    3rd Qu.:1.000  
##  Max.   : 1.0000    Max.   :3.000

Data Pre-processing

Missing Values

We have to check if there are any missing values in our data.

n_miss <- apply(is.na(data),1,sum)
table(n_miss)
## n_miss
##    0 
## 2126

As we can see, there are no missing values.

Let´s check the variables Description

lapply(data,class)
## $baseline.value
## [1] "numeric"
## 
## $accelerations
## [1] "numeric"
## 
## $fetal_movement
## [1] "numeric"
## 
## $uterine_contractions
## [1] "numeric"
## 
## $light_decelerations
## [1] "numeric"
## 
## $severe_decelerations
## [1] "numeric"
## 
## $prolongued_decelerations
## [1] "numeric"
## 
## $abnormal_short_term_variability
## [1] "numeric"
## 
## $mean_value_of_short_term_variability
## [1] "numeric"
## 
## $percentage_of_time_with_abnormal_long_term_variability
## [1] "numeric"
## 
## $mean_value_of_long_term_variability
## [1] "numeric"
## 
## $histogram_width
## [1] "numeric"
## 
## $histogram_min
## [1] "numeric"
## 
## $histogram_max
## [1] "numeric"
## 
## $histogram_number_of_peaks
## [1] "numeric"
## 
## $histogram_number_of_zeroes
## [1] "numeric"
## 
## $histogram_mode
## [1] "numeric"
## 
## $histogram_mean
## [1] "numeric"
## 
## $histogram_median
## [1] "numeric"
## 
## $histogram_variance
## [1] "numeric"
## 
## $histogram_tendency
## [1] "numeric"
## 
## $fetal_health
## [1] "numeric"

Target Variable: “fetal health”

This is a categorical variable, as it clasifies the overall health of the fetus. It takes value: 1- normal helath (0) 2- suspect health (1) 3- dangerous health (2)

Let´s have a look at its distribution.

summary(data$fetal_health)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.304   1.000   3.000
target.freq <- table(data$fetal_health)
target.freq
## 
##    1    2    3 
## 1655  295  176
target.freq.rel <- target.freq/nrow(data)
target.freq.rel
## 
##          1          2          3 
## 0.77845720 0.13875823 0.08278457

We can see that the classes are pretty unbalanced. This may become problematic with bayesian classifiers, as it is best if classes are balanced. We also see there is a clear predominancie of healthy fetus, and a minority of suspect and even less pathological fetus. Actually, almost 78% of the fetus in our data are considered healthy, while almost 14% suspicious and just 8% are in danger. This is reasonable: is we consider this a random sample, then most of the fetus are usually healthier, while it is not so common to find suspected or problematic fetus. After going through most of the research, I decided to go back to the beggining and start again with a target variable of only two levels. This is because there was too much noise on the models and very little information about some subgroups. After talking with a neonathologist, I decided that merging the suspicious and dangerous fetus is not a bad idea. This way the classes will be more balanced, and in practical effects, both suspisious and dangerous fetus would have to be subjected to deeper check.

data$fetal_health = factor(ifelse(data$fetal_health == 1, "Healthy" , "Warning"))

target.freq <- table(data$fetal_health)
target.freq
## 
## Healthy Warning 
##    1655     471
target.freq.rel <- target.freq/nrow(data)
target.freq.rel
## 
##   Healthy   Warning 
## 0.7784572 0.2215428

Some useful plots

lbles <- paste(names(target.freq), "\n", target.freq, sep="")
pie3D(target.freq,labels=lbles,explode=0.1,main="Pie Chart of Fetal Health",)

ggplot(data=data, aes(x = fetal_health)) + geom_bar(aes(fill = fetal_health)) + ggtitle("Bar Plot for Fetal Health")

We can see that the classes are pretty unbalanced. This may become problematic with bayesian classifiers, as it is best if classes are balanced. We also see there is a clear predominancie of healthy fetus, and a minority of suspect and even less pathological fetus. Actually, almost 78% of the fetus in our data are considered healthy, while almost 14% suspicious and just 8% are in danger. This is reasonable: is we consider this a random sample, then most of the fetus are usually healthier, while it is not so common to find suspected or problematic fetus.

features <- setdiff(colnames(data),'fetal_health')
features
##  [1] "baseline.value"                                        
##  [2] "accelerations"                                         
##  [3] "fetal_movement"                                        
##  [4] "uterine_contractions"                                  
##  [5] "light_decelerations"                                   
##  [6] "severe_decelerations"                                  
##  [7] "prolongued_decelerations"                              
##  [8] "abnormal_short_term_variability"                       
##  [9] "mean_value_of_short_term_variability"                  
## [10] "percentage_of_time_with_abnormal_long_term_variability"
## [11] "mean_value_of_long_term_variability"                   
## [12] "histogram_width"                                       
## [13] "histogram_min"                                         
## [14] "histogram_max"                                         
## [15] "histogram_number_of_peaks"                             
## [16] "histogram_number_of_zeroes"                            
## [17] "histogram_mode"                                        
## [18] "histogram_mean"                                        
## [19] "histogram_median"                                      
## [20] "histogram_variance"                                    
## [21] "histogram_tendency"

We have 21 features, described as follows. 1) Baseline Value 2) Accelerations 3) Fetal Movement 4) Uterine Contractions 5) Light Decelerations 6) Severe decelerations 7) Prolongued Decelerations 8) Abnormal Short Term Variability 9) Mean Value of Short Term Variability 10) Percentage of time with Abnormal Long Term Variability 11) Mean Value of Long term Variability 12) Histogram Width 13) Histogram Min 14) Histogram Max 15) Histogram Number of Peaks 16) Histogram Number of Zeroes 17) Histogram Mode 18) Histogram Mean 19) Histogram Median 20) Histogram Variance 21) Histogram Tendency

Let´s explore their histograms

for (f in features) {
  hist(data[,f], main=f)
}

Let´s study the problem in one dimension, to see if there is any single feature that helps us distinguish between the tree classes.

dat <- data
dat$fetal_health <- as.factor(dat$fetal_health)

univar_graph <- function(univar_name, univar, data, output_var) {
  g_1 <- ggplot(data, aes(x=univar)) + geom_density() + xlab(univar_name)
  g_2 <- ggplot(data, aes(x=univar, fill=output_var)) + geom_density(alpha=0.4) + xlab(univar_name)
  grid.arrange(g_1, g_2, ncol=2, top=paste(univar_name,"variable", "/ [ Skew:",skewness(univar),"]"))
}

for (x in 1:(ncol(data)-1)) {
  univar_graph(names(data)[x], data[,x], data, dat[,'fetal_health'])
}

We can see that variable “accelerations” distinguish a bit the two groups of interest.Also, “severe decelerations” look like it also distinguishes the two groups (as if the value it takes is different to zero, it looks like it is always talking about a warning fetus). The same happens with prolongued decelerations.Most of the other variables tend to be pretty similar in distribution between groups.

It would be interesting to perform the same analysis, but in a two dimessional perspective. unfortunatly ploting 21 quantitative variables with each other would result in a combination of: (21*21)-21 = 420 scatterplots. So we would have to analyze 110 scatterplots to see this properly. As this is not very reasonable, one good way of studying what variables may be influential to classify our data, we can use principal component analysis (this will be used later on).

Some corrections to be made

There are some variables that get a lot (most of the observations) of zeros. For instance, most of the cases are of healthy fetus, so the “severe decelerations” are almost not seen in the data. The difficulty arises when we combine continuous variables with variables that take almost always the same value. This can be a problem, and can be caracterized as zero and near-zero variance predictors.

nzv <- nearZeroVar(data, saveMetrics= TRUE)
nzv[nzv$nzv,][1:21,]
##                                                        freqRatio percentUnique
## severe_decelerations                                   302.71429    0.09407338
## prolongued_decelerations                                27.05556    0.28222013
## percentage_of_time_with_abnormal_long_term_variability  23.84615    4.09219191
## NA                                                            NA            NA
## NA.1                                                          NA            NA
## NA.2                                                          NA            NA
## NA.3                                                          NA            NA
## NA.4                                                          NA            NA
## NA.5                                                          NA            NA
## NA.6                                                          NA            NA
## NA.7                                                          NA            NA
## NA.8                                                          NA            NA
## NA.9                                                          NA            NA
## NA.10                                                         NA            NA
## NA.11                                                         NA            NA
## NA.12                                                         NA            NA
## NA.13                                                         NA            NA
## NA.14                                                         NA            NA
## NA.15                                                         NA            NA
## NA.16                                                         NA            NA
## NA.17                                                         NA            NA
##                                                        zeroVar  nzv
## severe_decelerations                                     FALSE TRUE
## prolongued_decelerations                                 FALSE TRUE
## percentage_of_time_with_abnormal_long_term_variability   FALSE TRUE
## NA                                                          NA   NA
## NA.1                                                        NA   NA
## NA.2                                                        NA   NA
## NA.3                                                        NA   NA
## NA.4                                                        NA   NA
## NA.5                                                        NA   NA
## NA.6                                                        NA   NA
## NA.7                                                        NA   NA
## NA.8                                                        NA   NA
## NA.9                                                        NA   NA
## NA.10                                                       NA   NA
## NA.11                                                       NA   NA
## NA.12                                                       NA   NA
## NA.13                                                       NA   NA
## NA.14                                                       NA   NA
## NA.15                                                       NA   NA
## NA.16                                                       NA   NA
## NA.17                                                       NA   NA

We can see that severe_decelerations may be a higly problematic variable in terms of variability. After considering and trying different options, I decided to keep it. This is because most models run with all this, and the truth is that “severe decelerations” almost does not vary because for healthy fetus is always takes value “0” and changes only for very extreme cases of unhelathy fetus. Having very unbalanced classes (still after combining suspicious and dangerous fetus), it would not be a very good idea to take out this variable, knowing that it has to do with very dangerous fetus. If i take it out from the models, then we would lose really valuable information about unhealthy fetus.

Let´s check for correlations, as this may also become a problem. If there are variables that are extremely linearly correlated, then our models will strike.

Correlation Matrix

R <- cor(data[,1:21])
R
##                                                        baseline.value
## baseline.value                                            1.000000000
## accelerations                                            -0.080559728
## fetal_movement                                           -0.033436480
## uterine_contractions                                     -0.146373271
## light_decelerations                                      -0.159031878
## severe_decelerations                                     -0.053517550
## prolongued_decelerations                                 -0.104597095
## abnormal_short_term_variability                           0.305570030
## mean_value_of_short_term_variability                     -0.279606597
## percentage_of_time_with_abnormal_long_term_variability    0.285629578
## mean_value_of_long_term_variability                      -0.032091023
## histogram_width                                          -0.147678762
## histogram_min                                             0.361619496
## histogram_max                                             0.275109794
## histogram_number_of_peaks                                -0.113933323
## histogram_number_of_zeroes                               -0.004744583
## histogram_mode                                            0.708992500
## histogram_mean                                            0.723121038
## histogram_median                                          0.789246407
## histogram_variance                                       -0.133938054
## histogram_tendency                                        0.293503485
##                                                        accelerations
## baseline.value                                          -0.080559728
## accelerations                                            1.000000000
## fetal_movement                                           0.048234576
## uterine_contractions                                     0.089674232
## light_decelerations                                     -0.108614752
## severe_decelerations                                    -0.043018117
## prolongued_decelerations                                -0.127748624
## abnormal_short_term_variability                         -0.279577473
## mean_value_of_short_term_variability                     0.207169746
## percentage_of_time_with_abnormal_long_term_variability  -0.373943010
## mean_value_of_long_term_variability                     -0.142363119
## histogram_width                                          0.298631025
## histogram_min                                           -0.154286330
## histogram_max                                            0.394146773
## histogram_number_of_peaks                                0.190451945
## histogram_number_of_zeroes                              -0.006146567
## histogram_mode                                           0.243609990
## histogram_mean                                           0.270334428
## histogram_median                                         0.272849129
## histogram_variance                                       0.125703963
## histogram_tendency                                       0.028419837
##                                                        fetal_movement
## baseline.value                                            -0.03343648
## accelerations                                              0.04823458
## fetal_movement                                             1.00000000
## uterine_contractions                                      -0.06877872
## light_decelerations                                        0.04922850
## severe_decelerations                                      -0.01097563
## prolongued_decelerations                                   0.26592207
## abnormal_short_term_variability                           -0.10371542
## mean_value_of_short_term_variability                       0.12131421
## percentage_of_time_with_abnormal_long_term_variability    -0.07409550
## mean_value_of_long_term_variability                        0.01104743
## histogram_width                                            0.16279008
## histogram_min                                             -0.15391727
## histogram_max                                              0.09985259
## histogram_number_of_peaks                                  0.16465407
## histogram_number_of_zeroes                                -0.01774917
## histogram_mode                                            -0.06119241
## histogram_mean                                            -0.08967121
## histogram_median                                          -0.07232949
## histogram_variance                                         0.17933965
## histogram_tendency                                        -0.00154140
##                                                        uterine_contractions
## baseline.value                                                 -0.146373271
## accelerations                                                   0.089674232
## fetal_movement                                                 -0.068778721
## uterine_contractions                                            1.000000000
## light_decelerations                                             0.285079038
## severe_decelerations                                            0.006788278
## prolongued_decelerations                                        0.077036104
## abnormal_short_term_variability                                -0.232810664
## mean_value_of_short_term_variability                            0.289678600
## percentage_of_time_with_abnormal_long_term_variability         -0.306607677
## mean_value_of_long_term_variability                            -0.066058165
## histogram_width                                                 0.142541039
## histogram_min                                                  -0.113323117
## histogram_max                                                   0.122765666
## histogram_number_of_peaks                                       0.082692963
## histogram_number_of_zeroes                                      0.057894493
## histogram_mode                                                 -0.104853944
## histogram_mean                                                 -0.187504748
## histogram_median                                               -0.140287407
## histogram_variance                                              0.238581881
## histogram_tendency                                             -0.072313551
##                                                        light_decelerations
## baseline.value                                               -1.590319e-01
## accelerations                                                -1.086148e-01
## fetal_movement                                                4.922850e-02
## uterine_contractions                                          2.850790e-01
## light_decelerations                                           1.000000e+00
## severe_decelerations                                          1.075730e-01
## prolongued_decelerations                                      2.256113e-01
## abnormal_short_term_variability                              -1.191518e-01
## mean_value_of_short_term_variability                          5.621699e-01
## percentage_of_time_with_abnormal_long_term_variability       -2.712825e-01
## mean_value_of_long_term_variability                          -2.429321e-01
## histogram_width                                               5.204674e-01
## histogram_min                                                -5.535336e-01
## histogram_max                                                 2.180426e-01
## histogram_number_of_peaks                                     3.976204e-01
## histogram_number_of_zeroes                                    2.352958e-01
## histogram_mode                                               -3.472325e-01
## histogram_mean                                               -5.273544e-01
## histogram_median                                             -3.885856e-01
## histogram_variance                                            5.642894e-01
## histogram_tendency                                            7.161302e-05
##                                                        severe_decelerations
## baseline.value                                                 -0.053517550
## accelerations                                                  -0.043018117
## fetal_movement                                                 -0.010975631
## uterine_contractions                                            0.006788278
## light_decelerations                                             0.107572979
## severe_decelerations                                            1.000000000
## prolongued_decelerations                                        0.012395431
## abnormal_short_term_variability                                 0.033948594
## mean_value_of_short_term_variability                            0.034129856
## percentage_of_time_with_abnormal_long_term_variability         -0.030770213
## mean_value_of_long_term_variability                            -0.037666897
## histogram_width                                                 0.044879585
## histogram_min                                                  -0.071973814
## histogram_max                                                  -0.021134754
## histogram_number_of_peaks                                       0.007024261
## histogram_number_of_zeroes                                      0.043441387
## histogram_mode                                                 -0.215160740
## histogram_mean                                                 -0.158672869
## histogram_median                                               -0.160451257
## histogram_variance                                              0.136420521
## histogram_tendency                                             -0.070483116
##                                                        prolongued_decelerations
## baseline.value                                                      -0.10459709
## accelerations                                                       -0.12774862
## fetal_movement                                                       0.26592207
## uterine_contractions                                                 0.07703610
## light_decelerations                                                  0.22561133
## severe_decelerations                                                 0.01239543
## prolongued_decelerations                                             1.00000000
## abnormal_short_term_variability                                      0.04622569
## mean_value_of_short_term_variability                                 0.26701062
## percentage_of_time_with_abnormal_long_term_variability              -0.13733337
## mean_value_of_long_term_variability                                 -0.22651353
## histogram_width                                                      0.26539100
## histogram_min                                                       -0.27676450
## histogram_max                                                        0.12022129
## histogram_number_of_peaks                                            0.22285969
## histogram_number_of_zeroes                                           0.05642320
## histogram_mode                                                      -0.43641595
## histogram_mean                                                      -0.48866344
## histogram_median                                                    -0.44477787
## histogram_variance                                                   0.50330146
## histogram_tendency                                                  -0.21540452
##                                                        abnormal_short_term_variability
## baseline.value                                                             0.305570030
## accelerations                                                             -0.279577473
## fetal_movement                                                            -0.103715419
## uterine_contractions                                                      -0.232810664
## light_decelerations                                                       -0.119151839
## severe_decelerations                                                       0.033948594
## prolongued_decelerations                                                   0.046225691
## abnormal_short_term_variability                                            1.000000000
## mean_value_of_short_term_variability                                      -0.430704976
## percentage_of_time_with_abnormal_long_term_variability                     0.459412717
## mean_value_of_long_term_variability                                       -0.315104671
## histogram_width                                                           -0.260462577
## histogram_min                                                              0.275377994
## histogram_max                                                             -0.111806050
## histogram_number_of_peaks                                                 -0.167561291
## histogram_number_of_zeroes                                                -0.149296431
## histogram_mode                                                             0.058363094
## histogram_mean                                                             0.074553692
## histogram_median                                                           0.119960169
## histogram_variance                                                        -0.146433817
## histogram_tendency                                                        -0.005747923
##                                                        mean_value_of_short_term_variability
## baseline.value                                                                  -0.27960660
## accelerations                                                                    0.20716975
## fetal_movement                                                                   0.12131421
## uterine_contractions                                                             0.28967860
## light_decelerations                                                              0.56216989
## severe_decelerations                                                             0.03412986
## prolongued_decelerations                                                         0.26701062
## abnormal_short_term_variability                                                 -0.43070498
## mean_value_of_short_term_variability                                             1.00000000
## percentage_of_time_with_abnormal_long_term_variability                          -0.47025888
## mean_value_of_long_term_variability                                              0.07389177
## histogram_width                                                                  0.66084678
## histogram_min                                                                   -0.62256925
## histogram_max                                                                    0.40907212
## histogram_number_of_peaks                                                        0.50142950
## histogram_number_of_zeroes                                                       0.26618303
## histogram_mode                                                                  -0.30758650
## histogram_mean                                                                  -0.44540138
## histogram_median                                                                -0.33610928
## histogram_variance                                                               0.55585244
## histogram_tendency                                                              -0.06613969
##                                                        percentage_of_time_with_abnormal_long_term_variability
## baseline.value                                                                                     0.28562958
## accelerations                                                                                     -0.37394301
## fetal_movement                                                                                    -0.07409550
## uterine_contractions                                                                              -0.30660768
## light_decelerations                                                                               -0.27128246
## severe_decelerations                                                                              -0.03077021
## prolongued_decelerations                                                                          -0.13733337
## abnormal_short_term_variability                                                                    0.45941272
## mean_value_of_short_term_variability                                                              -0.47025888
## percentage_of_time_with_abnormal_long_term_variability                                             1.00000000
## mean_value_of_long_term_variability                                                               -0.17111405
## histogram_width                                                                                   -0.45129650
## histogram_min                                                                                      0.42283438
## histogram_max                                                                                     -0.28318335
## histogram_number_of_peaks                                                                         -0.27930068
## histogram_number_of_zeroes                                                                        -0.12178368
## histogram_mode                                                                                     0.16521142
## histogram_mean                                                                                     0.22232062
## histogram_median                                                                                   0.18648019
## histogram_variance                                                                                -0.28153615
## histogram_tendency                                                                                 0.04248124
##                                                        mean_value_of_long_term_variability
## baseline.value                                                                -0.032091023
## accelerations                                                                 -0.142363119
## fetal_movement                                                                 0.011047430
## uterine_contractions                                                          -0.066058165
## light_decelerations                                                           -0.242932065
## severe_decelerations                                                          -0.037666897
## prolongued_decelerations                                                      -0.226513529
## abnormal_short_term_variability                                               -0.315104671
## mean_value_of_short_term_variability                                           0.073891775
## percentage_of_time_with_abnormal_long_term_variability                        -0.171114053
## mean_value_of_long_term_variability                                            1.000000000
## histogram_width                                                                0.110942057
## histogram_min                                                                 -0.144976343
## histogram_max                                                                  0.002022563
## histogram_number_of_peaks                                                      0.056357459
## histogram_number_of_zeroes                                                     0.123869099
## histogram_mode                                                                 0.072070968
## histogram_mean                                                                 0.137813332
## histogram_median                                                               0.063227533
## histogram_variance                                                            -0.164079171
## histogram_tendency                                                             0.153092953
##                                                        histogram_width
## baseline.value                                             -0.14767876
## accelerations                                               0.29863103
## fetal_movement                                              0.16279008
## uterine_contractions                                        0.14254104
## light_decelerations                                         0.52046737
## severe_decelerations                                        0.04487958
## prolongued_decelerations                                    0.26539100
## abnormal_short_term_variability                            -0.26046258
## mean_value_of_short_term_variability                        0.66084678
## percentage_of_time_with_abnormal_long_term_variability     -0.45129650
## mean_value_of_long_term_variability                         0.11094206
## histogram_width                                             1.00000000
## histogram_min                                              -0.89851896
## histogram_max                                               0.69076879
## histogram_number_of_peaks                                   0.74707094
## histogram_number_of_zeroes                                  0.31727611
## histogram_mode                                             -0.15926065
## histogram_mean                                             -0.28084632
## histogram_median                                           -0.16885434
## histogram_variance                                          0.61588381
## histogram_tendency                                          0.11815183
##                                                        histogram_min
## baseline.value                                            0.36161950
## accelerations                                            -0.15428633
## fetal_movement                                           -0.15391727
## uterine_contractions                                     -0.11332312
## light_decelerations                                      -0.55353359
## severe_decelerations                                     -0.07197381
## prolongued_decelerations                                 -0.27676450
## abnormal_short_term_variability                           0.27537799
## mean_value_of_short_term_variability                     -0.62256925
## percentage_of_time_with_abnormal_long_term_variability    0.42283438
## mean_value_of_long_term_variability                      -0.14497634
## histogram_width                                          -0.89851896
## histogram_min                                             1.00000000
## histogram_max                                            -0.30328584
## histogram_number_of_peaks                                -0.67028685
## histogram_number_of_zeroes                               -0.30656707
## histogram_mode                                            0.35306685
## histogram_mean                                            0.48612117
## histogram_median                                          0.40019105
## histogram_variance                                       -0.54509064
## histogram_tendency                                       -0.24257913
##                                                        histogram_max
## baseline.value                                           0.275109794
## accelerations                                            0.394146773
## fetal_movement                                           0.099852590
## uterine_contractions                                     0.122765666
## light_decelerations                                      0.218042634
## severe_decelerations                                    -0.021134754
## prolongued_decelerations                                 0.120221292
## abnormal_short_term_variability                         -0.111806050
## mean_value_of_short_term_variability                     0.409072124
## percentage_of_time_with_abnormal_long_term_variability  -0.283183354
## mean_value_of_long_term_variability                      0.002022563
## histogram_width                                          0.690768790
## histogram_min                                           -0.303285839
## histogram_max                                            1.000000000
## histogram_number_of_peaks                                0.517652125
## histogram_number_of_zeroes                               0.183765568
## histogram_mode                                           0.235877114
## histogram_mean                                           0.191108278
## histogram_median                                         0.292679501
## histogram_variance                                       0.439094134
## histogram_tendency                                      -0.143110677
##                                                        histogram_number_of_peaks
## baseline.value                                                      -0.113933323
## accelerations                                                        0.190451945
## fetal_movement                                                       0.164654071
## uterine_contractions                                                 0.082692963
## light_decelerations                                                  0.397620358
## severe_decelerations                                                 0.007024261
## prolongued_decelerations                                             0.222859690
## abnormal_short_term_variability                                     -0.167561291
## mean_value_of_short_term_variability                                 0.501429501
## percentage_of_time_with_abnormal_long_term_variability              -0.279300683
## mean_value_of_long_term_variability                                  0.056357459
## histogram_width                                                      0.747070940
## histogram_min                                                       -0.670286852
## histogram_max                                                        0.517652125
## histogram_number_of_peaks                                            1.000000000
## histogram_number_of_zeroes                                           0.288592536
## histogram_mode                                                      -0.100123321
## histogram_mean                                                      -0.217242336
## histogram_median                                                    -0.122844095
## histogram_variance                                                   0.454353980
## histogram_tendency                                                   0.110114103
##                                                        histogram_number_of_zeroes
## baseline.value                                                       -0.004744583
## accelerations                                                        -0.006146567
## fetal_movement                                                       -0.017749174
## uterine_contractions                                                  0.057894493
## light_decelerations                                                   0.235295786
## severe_decelerations                                                  0.043441387
## prolongued_decelerations                                              0.056423204
## abnormal_short_term_variability                                      -0.149296431
## mean_value_of_short_term_variability                                  0.266183026
## percentage_of_time_with_abnormal_long_term_variability               -0.121783680
## mean_value_of_long_term_variability                                   0.123869099
## histogram_width                                                       0.317276110
## histogram_min                                                        -0.306567072
## histogram_max                                                         0.183765568
## histogram_number_of_peaks                                             0.288592536
## histogram_number_of_zeroes                                            1.000000000
## histogram_mode                                                       -0.057978109
## histogram_mean                                                       -0.084417291
## histogram_median                                                     -0.052896401
## histogram_variance                                                    0.196746881
## histogram_tendency                                                    0.084694222
##                                                        histogram_mode
## baseline.value                                             0.70899250
## accelerations                                              0.24360999
## fetal_movement                                            -0.06119241
## uterine_contractions                                      -0.10485394
## light_decelerations                                       -0.34723254
## severe_decelerations                                      -0.21516074
## prolongued_decelerations                                  -0.43641595
## abnormal_short_term_variability                            0.05836309
## mean_value_of_short_term_variability                      -0.30758650
## percentage_of_time_with_abnormal_long_term_variability     0.16521142
## mean_value_of_long_term_variability                        0.07207097
## histogram_width                                           -0.15926065
## histogram_min                                              0.35306685
## histogram_max                                              0.23587711
## histogram_number_of_peaks                                 -0.10012332
## histogram_number_of_zeroes                                -0.05797811
## histogram_mode                                             1.00000000
## histogram_mean                                             0.89341238
## histogram_median                                           0.93339916
## histogram_variance                                        -0.31276475
## histogram_tendency                                         0.41556388
##                                                        histogram_mean
## baseline.value                                             0.72312104
## accelerations                                              0.27033443
## fetal_movement                                            -0.08967121
## uterine_contractions                                      -0.18750475
## light_decelerations                                       -0.52735437
## severe_decelerations                                      -0.15867287
## prolongued_decelerations                                  -0.48866344
## abnormal_short_term_variability                            0.07455369
## mean_value_of_short_term_variability                      -0.44540138
## percentage_of_time_with_abnormal_long_term_variability     0.22232062
## mean_value_of_long_term_variability                        0.13781333
## histogram_width                                           -0.28084632
## histogram_min                                              0.48612117
## histogram_max                                              0.19110828
## histogram_number_of_peaks                                 -0.21724234
## histogram_number_of_zeroes                                -0.08441729
## histogram_mode                                             0.89341238
## histogram_mean                                             1.00000000
## histogram_median                                           0.94825134
## histogram_variance                                        -0.39942323
## histogram_tendency                                         0.32707559
##                                                        histogram_median
## baseline.value                                               0.78924641
## accelerations                                                0.27284913
## fetal_movement                                              -0.07232949
## uterine_contractions                                        -0.14028741
## light_decelerations                                         -0.38858559
## severe_decelerations                                        -0.16045126
## prolongued_decelerations                                    -0.44477787
## abnormal_short_term_variability                              0.11996017
## mean_value_of_short_term_variability                        -0.33610928
## percentage_of_time_with_abnormal_long_term_variability       0.18648019
## mean_value_of_long_term_variability                          0.06322753
## histogram_width                                             -0.16885434
## histogram_min                                                0.40019105
## histogram_max                                                0.29267950
## histogram_number_of_peaks                                   -0.12284410
## histogram_number_of_zeroes                                  -0.05289640
## histogram_mode                                               0.93339916
## histogram_mean                                               0.94825134
## histogram_median                                             1.00000000
## histogram_variance                                          -0.29410467
## histogram_tendency                                           0.38920975
##                                                        histogram_variance
## baseline.value                                                -0.13393805
## accelerations                                                  0.12570396
## fetal_movement                                                 0.17933965
## uterine_contractions                                           0.23858188
## light_decelerations                                            0.56428941
## severe_decelerations                                           0.13642052
## prolongued_decelerations                                       0.50330146
## abnormal_short_term_variability                               -0.14643382
## mean_value_of_short_term_variability                           0.55585244
## percentage_of_time_with_abnormal_long_term_variability        -0.28153615
## mean_value_of_long_term_variability                           -0.16407917
## histogram_width                                                0.61588381
## histogram_min                                                 -0.54509064
## histogram_max                                                  0.43909413
## histogram_number_of_peaks                                      0.45435398
## histogram_number_of_zeroes                                     0.19674688
## histogram_mode                                                -0.31276475
## histogram_mean                                                -0.39942323
## histogram_median                                              -0.29410467
## histogram_variance                                             1.00000000
## histogram_tendency                                            -0.07801258
##                                                        histogram_tendency
## baseline.value                                               2.935035e-01
## accelerations                                                2.841984e-02
## fetal_movement                                              -1.541400e-03
## uterine_contractions                                        -7.231355e-02
## light_decelerations                                          7.161302e-05
## severe_decelerations                                        -7.048312e-02
## prolongued_decelerations                                    -2.154045e-01
## abnormal_short_term_variability                             -5.747923e-03
## mean_value_of_short_term_variability                        -6.613969e-02
## percentage_of_time_with_abnormal_long_term_variability       4.248124e-02
## mean_value_of_long_term_variability                          1.530930e-01
## histogram_width                                              1.181518e-01
## histogram_min                                               -2.425791e-01
## histogram_max                                               -1.431107e-01
## histogram_number_of_peaks                                    1.101141e-01
## histogram_number_of_zeroes                                   8.469422e-02
## histogram_mode                                               4.155639e-01
## histogram_mean                                               3.270756e-01
## histogram_median                                             3.892097e-01
## histogram_variance                                          -7.801258e-02
## histogram_tendency                                           1.000000e+00
ggcorrplot(R,lab=TRUE)

Now, I will proceed to take out some variables that have a perfect colinearity with others, meaning they have either 1 or very close to 1 correlation. When deciding which of the two variables to take out (between a pair of highly correlated variables) I decided to take out those who did not vary a lot between healthy and warning fetus.

X = subset(data, select = -c(histogram_median,histogram_mode,histogram_min,histogram_variance,histogram_mean))
dim(X)
## [1] 2126   17

Checking linear dependencies inside of the matrix

X_matr <- data.matrix(X)
comboInfo <- findLinearCombos(X_matr)
comboInfo
## $linearCombos
## list()
## 
## $remove
## NULL

There are no columns that are a linear combination of others.

Converting to normals

Now I will transform the variables, to make them more symetrical and close to a gaussian. Also, when diong this, sometimes some outliers dissapear, as variability shrinkages.

Let´s transform the variables into gaussians when possible.

X_trans <- X
baseline_value_norm <- bestNormalize(X_trans$baseline.value,out_of_sample = TRUE)
baseline_value_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 2.7836
##  - Box-Cox: 2.6679
##  - Exp(x): 230.5445
##  - Log_b(x+a): 2.7836
##  - No transform: 2.6493
##  - orderNorm (ORQ): 1.8969
##  - sqrt(x + a): 2.6393
##  - Yeo-Johnson: 2.6679
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
##  - 48 unique values 
##  - Original quantiles:
##   0%  25%  50%  75% 100% 
##  106  126  133  140  160
X_trans[,1] <- baseline_value_norm$x.t
colnames(X_trans)[1] <- "baseline_value_norm"

accelerations_norm <- bestNormalize(X_trans$accelerations,out_of_sample = TRUE)
accelerations_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 41.2685
##  - Exp(x): 41.2685
##  - Log_b(x+a): 43.0878
##  - No transform: 41.2685
##  - orderNorm (ORQ): 41.143
##  - sqrt(x + a): 42.178
##  - Yeo-Johnson: 41.323
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
##  - 20 unique values 
##  - Original quantiles:
##    0%   25%   50%   75%  100% 
## 0.000 0.000 0.002 0.006 0.019
X_trans[,2] <- accelerations_norm$x.t
colnames(X_trans)[2] <- "accelerations_norm"

fetal_movement_norm <- bestNormalize(X_trans$fetal_movement,out_of_sample = TRUE)
fetal_movement_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 139.094
##  - Exp(x): 138.3539
##  - Log_b(x+a): 89.1098
##  - No transform: 138.7304
##  - orderNorm (ORQ): 89.9936
##  - sqrt(x + a): 88.2207
##  - Yeo-Johnson: 109.2949
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## Standardized sqrt(x + a) Transformation with 2126 nonmissing obs.:
##  Relevant statistics:
##  - a = 0 
##  - mean (before standardization) = 0.0402022 
##  - sd (before standardization) = 0.08870287
X_trans[,3] <- fetal_movement_norm$x.t
colnames(X_trans)[3] <- "fetal_movement_norm"


uterine_contractions_norm <- bestNormalize(X_trans$uterine_contractions)
uterine_contractions_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 14.4396
##  - Exp(x): 14.4396
##  - Log_b(x+a): 15.809
##  - No transform: 14.4396
##  - orderNorm (ORQ): 14.1491
##  - sqrt(x + a): 14.4808
##  - Yeo-Johnson: 14.526
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
##  - 16 unique values 
##  - Original quantiles:
##    0%   25%   50%   75%  100% 
## 0.000 0.002 0.004 0.007 0.015
X_trans[,4] <- uterine_contractions_norm$x.t
colnames(X_trans)[4] <- "uterine_contractions_norm"

light_decelerations_norm <- bestNormalize(X_trans$light_decelerations)
light_decelerations_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 78.946
##  - Exp(x): 78.9381
##  - Log_b(x+a): 80.275
##  - No transform: 78.946
##  - orderNorm (ORQ): 79.191
##  - sqrt(x + a): 79.8238
##  - Yeo-Johnson: 79.0013
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## Standardized exp(x) Transformation with 2126 nonmissing obs.:
##  Relevant statistics:
##  - mean (before standardization) = 1.001896 
##  - sd (before standardization) = 0.002973372
X_trans[,5] <- light_decelerations_norm$x.t
colnames(X_trans)[5] <- "light_decelerations_norm"


severe_decelerations_norm <- bestNormalize(X_trans$severe_decelerations)
severe_decelerations_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 144.0906
##  - Exp(x): 144.0906
##  - Log_b(x+a): 144.0906
##  - No transform: 144.0906
##  - orderNorm (ORQ): 144.0906
##  - sqrt(x + a): 144.0906
##  - Yeo-Johnson: 144.0906
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## Standardized asinh(x) Transformation with 2126 nonmissing obs.:
##  Relevant statistics:
##  - mean (before standardization) = 3.292568e-06 
##  - sd (before standardization) = 5.729983e-05
X_trans[,6] <- severe_decelerations_norm$x.t
colnames(X_trans)[6] <- "severe_decelerations_norm"

prolongued_decelerations_norm <- bestNormalize(X_trans$prolongued_decelerations)
prolongued_decelerations_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 201.3479
##  - Exp(x): 201.3479
##  - Log_b(x+a): 201.93
##  - No transform: 201.3479
##  - orderNorm (ORQ): 201.9626
##  - sqrt(x + a): 201.9626
##  - Yeo-Johnson: 201.3479
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## Standardized asinh(x) Transformation with 2126 nonmissing obs.:
##  Relevant statistics:
##  - mean (before standardization) = 0.0001585135 
##  - sd (before standardization) = 0.0005899466
X_trans[,7] <- prolongued_decelerations_norm$x.t
colnames(X_trans)[7] <- "prolongued_decelerations_norm"

abnormal_short_term_variability_norm <- bestNormalize(X_trans$abnormal_short_term_variability)
abnormal_short_term_variability_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 5.1521
##  - Box-Cox: 3.4535
##  - Exp(x): 230.6246
##  - Log_b(x+a): 5.1521
##  - No transform: 3.7012
##  - orderNorm (ORQ): 1.3693
##  - sqrt(x + a): 3.6356
##  - Yeo-Johnson: 3.4894
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
##  - 75 unique values 
##  - Original quantiles:
##   0%  25%  50%  75% 100% 
##   12   32   49   61   87
X_trans[,8] <- abnormal_short_term_variability_norm$x.t
colnames(X_trans)[8] <- "abnormal_short_term_variability_norm"

 
mean_value_of_st_var_norm <- bestNormalize(X_trans$mean_value_of_short_term_variability)
mean_value_of_st_var_norm
## Best Normalizing transformation with 2126 Observations
##  Estimated Normality Statistics (Pearson P / df, lower => more normal):
##  - arcsinh(x): 2.5215
##  - Box-Cox: 2.156
##  - Exp(x): 86.6351
##  - Log_b(x+a): 2.5266
##  - No transform: 5.4903
##  - orderNorm (ORQ): 1.7439
##  - sqrt(x + a): 2.4089
##  - Yeo-Johnson: 2.1268
## Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
##  
## Based off these, bestNormalize chose:
## orderNorm Transformation with 2126 nonmissing obs and ties
##  - 57 unique values 
##  - Original quantiles:
##   0%  25%  50%  75% 100% 
##  0.2  0.7  1.2  1.7  7.0
X_trans[,9] <- mean_value_of_st_var_norm$x.t
colnames(X_trans)[9] <- "mean_value_of_st_var_norm"

perc_of_time_w_abn_long_term_var_norm <- bestNormalize(X_trans$percentage_of_time_with_abnormal_long_term_variability)
X_trans[,10] <- perc_of_time_w_abn_long_term_var_norm$x.t
colnames(X_trans)[10] <- "perc_of_time_w_abn_long_term_var_norm"

mean_value_of_lt_var_norm  <- bestNormalize(X_trans$mean_value_of_long_term_variability)
X_trans[,11] <- mean_value_of_lt_var_norm$x.t
colnames(X_trans)[11] <- "mean_value_of_lt_var_norm"

histogram_width_norm <- bestNormalize(X_trans$histogram_width)
X_trans[,12] <- histogram_width_norm$x.t
colnames(X_trans)[12] <- "histogram_width_norm"

histogram_max_norm <- bestNormalize(X_trans$histogram_max)
X_trans[,13] <- histogram_max_norm$x.t
colnames(X_trans)[13] <- "histogram_max_norm"

histogram_number_piks_norm <- bestNormalize(X_trans$histogram_number_of_peaks)
X_trans[,14] <- histogram_number_piks_norm$x.t
colnames(X_trans)[14] <- "histogram_number_piks_norm"

histogram_number_zeros_norm <- bestNormalize(X_trans$histogram_number_of_zeroes)
X_trans[,15] <- histogram_number_zeros_norm$x.t
colnames(X_trans)[15] <- "histogram_number_zeros_norm"


histogram_tendency_norm <- bestNormalize(X_trans$histogram_tendency +1)
X_trans[,16] <- histogram_tendency_norm$x.t
colnames(X_trans)[16] <- "histogram_tendency_norm"

Outlier Treatment

summary(X_trans)
##  baseline_value_norm accelerations_norm fetal_movement_norm
##  Min.   :-2.939015   Min.   :-0.80554   Min.   :-0.4532    
##  1st Qu.:-0.687125   1st Qu.:-0.80554   1st Qu.:-0.4532    
##  Median :-0.009432   Median : 0.06371   Median :-0.4532    
##  Mean   : 0.000341   Mean   : 0.05167   Mean   : 0.0000    
##  3rd Qu.: 0.628566   3rd Qu.: 0.74503   3rd Qu.: 0.1643    
##  Max.   : 3.497088   Max.   : 3.49709   Max.   : 7.3655    
##  uterine_contractions_norm light_decelerations_norm severe_decelerations_norm
##  Min.   :-1.41810          Min.   :-0.6375          Min.   :-0.05746         
##  1st Qu.:-0.67671          1st Qu.:-0.6375          1st Qu.:-0.05746         
##  Median :-0.14077          Median :-0.6375          Median :-0.05746         
##  Mean   : 0.01627          Mean   : 0.0000          Mean   : 0.00000         
##  3rd Qu.: 0.83192          3rd Qu.: 0.3729          3rd Qu.:-0.05746         
##  Max.   : 3.49709          Max.   : 4.4453          Max.   :17.39459         
##  prolongued_decelerations_norm abnormal_short_term_variability_norm
##  Min.   :-0.2687               Min.   :-3.307677                   
##  1st Qu.:-0.2687               1st Qu.:-0.678195                   
##  Median :-0.2687               Median : 0.008843                   
##  Mean   : 0.0000               Mean   : 0.000098                   
##  3rd Qu.:-0.2687               3rd Qu.: 0.668581                   
##  Max.   : 8.2066               Max.   : 3.497088                   
##  mean_value_of_st_var_norm perc_of_time_w_abn_long_term_var_norm
##  Min.   :-2.288520         Min.   :-0.6965                      
##  1st Qu.:-0.656832         1st Qu.:-0.6965                      
##  Median : 0.032429         Median :-0.6965                      
##  Mean   : 0.003249         Mean   : 0.0000                      
##  3rd Qu.: 0.645900         3rd Qu.: 0.5914                      
##  Max.   : 3.497088         Max.   : 3.0076                      
##  mean_value_of_lt_var_norm histogram_width_norm histogram_max_norm 
##  Min.   :-1.849122         Min.   :-3.307677    Min.   :-3.307677  
##  1st Qu.:-0.668581         1st Qu.:-0.667107    1st Qu.:-0.659028  
##  Median :-0.004127         Median : 0.000000    Median : 0.002948  
##  Mean   : 0.006749         Mean   : 0.000017    Mean   :-0.000237  
##  3rd Qu.: 0.667844         3rd Qu.: 0.670794    3rd Qu.: 0.650263  
##  Max.   : 3.497088         Max.   : 3.497088    Max.   : 2.986468  
##  histogram_number_piks_norm histogram_number_zeros_norm histogram_tendency_norm
##  Min.   :-1.95715           Min.   : 0.0000             Min.   :-2.5754        
##  1st Qu.:-0.53566           1st Qu.: 0.0000             1st Qu.:-0.3859        
##  Median :-0.15806           Median : 0.0000             Median :-0.3859        
##  Mean   : 0.01012           Mean   : 0.3236             Mean   : 0.0000        
##  3rd Qu.: 0.69912           3rd Qu.: 0.0000             3rd Qu.: 1.0109        
##  Max.   : 3.49709           Max.   :10.0000             Max.   : 1.0109        
##   fetal_health 
##  Healthy:1655  
##  Warning: 471  
##                
##                
##                
## 

By looking at the histograms and new summary of the variables after transformation, there are 4 variables that should be tested for outliers. To do this I will use the Rosner´s test. This is a very useful test, not only because it provides us with a formal test to check for outliers in each variable, but also because it can detect more than one at the same time. This test is also designed to detect problems related to masking: this is when two outliers are very close and one of them goes undetected. I will use k=4 as the number of suspected outliers in the variables.

test_3 <- rosnerTest(X_trans$fetal_movement_norm,k=10)
test_3$all.stats
##    i        Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0 -1.534320e-17 1.0000000 7.365482     697 7.365482   4.220179    TRUE
## 2  1 -3.466109e-03 0.9873790 7.332904     695 7.430147   4.220070    TRUE
## 3  2 -6.920144e-03 0.9746858 7.275562     694 7.471620   4.219960    TRUE
## 4  3 -1.035042e-02 0.9620061 7.267336      20 7.565115   4.219851    TRUE
## 5  4 -1.378006e-02 0.9491621 7.151231     692 7.548775   4.219742    TRUE
## 6  5 -1.715819e-02 0.9365396 7.117731      19 7.618352   4.219632    TRUE
## 7  6 -2.052370e-02 0.9238426 7.075646     696 7.681146   4.219523    TRUE
## 8  7 -2.387253e-02 0.9110992 7.050282     693 7.764417   4.219414    TRUE
## 9  8 -2.721255e-02 0.8982448 7.033325      17 7.860371   4.219304    TRUE
## 10 9 -3.054771e-02 0.8852428 6.939366     700 7.873448   4.219194    TRUE
test_6 <- rosnerTest(X_trans$severe_decelerations_norm,k=10)
test_6$all.stats
##    i        Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0  1.000759e-17 1.0000000 17.39459    1489 17.39459   4.220179    TRUE
## 2  1 -8.185691e-03 0.9262559 17.39459    1490 18.78831   4.220070    TRUE
## 3  2 -1.637909e-02 0.8459503 17.39459    1792 20.58156   4.219960    TRUE
## 4  3 -2.458021e-02 0.7569974 17.39459    1793 23.01088   4.219851    TRUE
## 5  4 -3.278906e-02 0.6558880 17.39459    1794 26.57067   4.219742    TRUE
## 6  5 -4.100564e-02 0.5357829 17.39459    1795 32.54229   4.219632    TRUE
## 7  6 -4.922998e-02 0.3790344 17.39459    1796 46.02174   4.219523    TRUE
## 8  7 -5.746208e-02 0.0000000       NA      NA       NA   4.219414      NA
## 9  8            NA        NA       NA      NA       NA   4.219304      NA
## 10 9            NA        NA       NA      NA       NA   4.219194      NA
test_7 <- rosnerTest(X_trans$prolongued_decelerations_norm,k=10)
test_7$all.stats
##    i        Mean.i      SD.i    Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1  0  3.075717e-17 1.0000000 8.206617     705 8.206617   4.220179    TRUE
## 2  1 -3.861937e-03 0.9842497 8.206617    1755 8.341866   4.220070    TRUE
## 3  2 -7.727511e-03 0.9682126 8.206617    1756 8.484029   4.219960    TRUE
## 4  3 -1.159673e-02 0.9518742 6.511566     683 6.852967   4.219851    TRUE
## 5  4 -1.467079e-02 0.9414989 6.511566     684 6.931752   4.219742    TRUE
## 6  5 -1.774775e-02 0.9309878 6.511566     703 7.013318   4.219632    TRUE
## 7  6 -2.082762e-02 0.9203363 6.511566    1350 7.097832   4.219523    TRUE
## 8  7 -2.391039e-02 0.9095394 6.511566    1750 7.185478   4.219414    TRUE
## 9  8 -2.699607e-02 0.8985919 6.511566    1754 7.276453   4.219304    TRUE
## 10 9 -3.008467e-02 0.8874879 6.511566    1757 7.370974   4.219194    TRUE
test_16 <- rosnerTest(X_trans$histogram_number_zeros_norm,k=10)
test_16$all.stats
##    i    Mean.i      SD.i Value Obs.Num     R.i+1 lambda.i+1 Outlier
## 1  0 0.3236124 0.7060594    10     142 13.704779   4.220179    TRUE
## 2  1 0.3190588 0.6742779     8     439 11.391358   4.220070    TRUE
## 3  2 0.3154426 0.6535001     7     440 10.228854   4.219960    TRUE
## 4  3 0.3122939 0.6373355     5     144  7.355162   4.219851    TRUE
## 5  4 0.3100848 0.6293034     5     643  7.452551   4.219742    TRUE
## 6  5 0.3078736 0.6211517     4      90  5.944001   4.219632    TRUE
## 7  6 0.3061321 0.6160969     4     390  5.995596   4.219523    TRUE
## 8  7 0.3043889 0.6109905     3       6  4.411871   4.219414    TRUE
## 9  8 0.3031161 0.6083187     3       7  4.433340   4.219304    TRUE
## 10 9 0.3018422 0.6056300     3     391  4.455126   4.219194    TRUE

Even when these tests are throwing results as outliers, I decided I am not going to get rid of them. The reason lies on the structure of the data. This is due to the fact that most of the fetus in the sample are healthy, therefore most of them have very standard (most of the times equal to zero) measures of several variables. If I take out 10 observations that represent severe cases of unhelthy fetus, I would be extracting very useful information from my data and I would make it even more unbalanced towards healthy fetus This way, I will interpret that these values correspond to the most serious cases, the ones that are zero (or very repetead) correspond to healthy fetus.

Next step in our preprocessing of data is to standarize the data.

By standarizing we mean transforming the data to get a mean equal to zero, and standard deviation equal to one.

X_trans[1:16] <- (lapply(X_trans[1:16],scale, center=TRUE,scale=TRUE))
summary(X_trans)
##  baseline_value_norm.V1 accelerations_norm.V1 fetal_movement_norm.V1
##  Min.   :-2.951218      Min.   :-0.977989     Min.   :-0.453223     
##  1st Qu.:-0.690240      1st Qu.:-0.977989     1st Qu.:-0.453223     
##  Median :-0.009813      Median : 0.013740     Median :-0.453223     
##  Mean   : 0.000000      Mean   : 0.000000     Mean   : 0.000000     
##  3rd Qu.: 0.630761      3rd Qu.: 0.791053     3rd Qu.: 0.164257     
##  Max.   : 3.510858      Max.   : 3.930881     Max.   : 7.365482     
##  uterine_contractions_norm.V1 light_decelerations_norm.V1
##  Min.   :-1.509392            Min.   :-0.637540          
##  1st Qu.:-0.729227            1st Qu.:-0.637540          
##  Median :-0.165253            Median :-0.637540          
##  Mean   : 0.000000            Mean   : 0.000000          
##  3rd Qu.: 0.858307            3rd Qu.: 0.372930          
##  Max.   : 3.662875            Max.   : 4.445262          
##  severe_decelerations_norm.V1 prolongued_decelerations_norm.V1
##  Min.   :-0.057462            Min.   :-0.268691               
##  1st Qu.:-0.057462            1st Qu.:-0.268691               
##  Median :-0.057462            Median :-0.268691               
##  Mean   : 0.000000            Mean   : 0.000000               
##  3rd Qu.:-0.057462            3rd Qu.:-0.268691               
##  Max.   :17.394594            Max.   : 8.206617               
##  abnormal_short_term_variability_norm.V1 mean_value_of_st_var_norm.V1
##  Min.   :-3.312330                       Min.   :-2.313644           
##  1st Qu.:-0.679227                       1st Qu.:-0.666382           
##  Median : 0.008757                       Median : 0.029458           
##  Mean   : 0.000000                       Mean   : 0.000000           
##  3rd Qu.: 0.669404                       3rd Qu.: 0.648784           
##  Max.   : 3.501807                       Max.   : 3.527187           
##  perc_of_time_w_abn_long_term_var_norm.V1 mean_value_of_lt_var_norm.V1
##  Min.   :-0.6964575                       Min.   :-1.889534           
##  1st Qu.:-0.6964575                       1st Qu.:-0.687580           
##  Median :-0.6964575                       Median :-0.011073           
##  Mean   : 0.0000000                       Mean   : 0.000000           
##  3rd Qu.: 0.5913592                       3rd Qu.: 0.673086           
##  Max.   : 3.0076059                       Max.   : 3.553650           
##  histogram_width_norm.V1 histogram_max_norm.V1 histogram_number_piks_norm.V1
##  Min.   :-3.310674       Min.   :-3.313034     Min.   :-2.028587            
##  1st Qu.:-0.667725       1st Qu.:-0.659905     1st Qu.:-0.562792            
##  Median :-0.000017       Median : 0.003190     Median :-0.173418            
##  Mean   : 0.000000       Mean   : 0.000000     Mean   : 0.000000            
##  3rd Qu.: 0.671382       3rd Qu.: 0.651601     3rd Qu.: 0.710472            
##  Max.   : 3.500222       Max.   : 2.991758     Max.   : 3.595642            
##  histogram_number_zeros_norm.V1 histogram_tendency_norm.V1  fetal_health 
##  Min.   :-0.458336              Min.   :-2.5754344         Healthy:1655  
##  1st Qu.:-0.458336              1st Qu.:-0.3858943         Warning: 471  
##  Median :-0.458336              Median :-0.3858943                       
##  Mean   : 0.000000              Mean   : 0.0000000                       
##  3rd Qu.:-0.458336              3rd Qu.: 1.0108970                       
##  Max.   :13.704779              Max.   : 1.0108970

We have now 17 variables. 16 of them are numerical, had been transformed and standarized (behave more or less like gaussians, range from 0 to 1). Transformation is crucial for LDA and QDA, and standarization is crucial for LDA (as it assumes equal correlation.)

Now, let´s perform principal component analysis to check which are the predictors that better explain the variability of our data (hence, carry more information about our sample).

Principal Component Analysis

X_transs <- X_trans[,1:16]
Y <- X_trans [,17]

X_pcs <- prcomp(X_transs,scale=TRUE)
colors_X <- c(color_2,color_3)[1*(Y=="Healthy")+1]
par(mfrow=c(1,1))
plot(X_pcs$x[,1:2],pch=19,col=colors_X)

get_eigenvalue(X_pcs)
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1   4.3537266       27.2107913                    27.21079
## Dim.2   1.7542996       10.9643723                    38.17516
## Dim.3   1.6908935       10.5680847                    48.74325
## Dim.4   1.3423103        8.3894392                    57.13269
## Dim.5   1.2371720        7.7323250                    64.86501
## Dim.6   1.0067872        6.2924203                    71.15743
## Dim.7   0.9151083        5.7194269                    76.87686
## Dim.8   0.7738885        4.8368033                    81.71366
## Dim.9   0.6719093        4.1994334                    85.91310
## Dim.10  0.5860738        3.6629615                    89.57606
## Dim.11  0.5669214        3.5432586                    93.11932
## Dim.12  0.4438006        2.7737535                    95.89307
## Dim.13  0.2403512        1.5021950                    97.39526
## Dim.14  0.1952423        1.2202644                    98.61553
## Dim.15  0.1498119        0.9363245                    99.55185
## Dim.16  0.0717034        0.4481463                   100.00000

We can see, at a first glance, that our two first principal components do not explain the difference between healthy and warning fetus, as both groups are not exactly separated in the plot. Let´s do a deeper analysis. Let´s run another code and check with parallel computind and a simple plot how many components should we keep.

pca_output <- PCA(X_transs,ncp = 8, graph = FALSE)
pca_output$var$contrib
##                                              Dim.1        Dim.2     Dim.3
## baseline_value_norm                    2.047309488 2.585831e+00 30.099178
## accelerations_norm                     4.788399956 4.761246e+00  2.211752
## fetal_movement_norm                    0.828561539 4.303529e+00  2.271503
## uterine_contractions_norm              3.362182152 4.403995e-01  7.256322
## light_decelerations_norm               7.855496410 7.970260e+00  2.284149
## severe_decelerations_norm              0.053213303 1.325733e+00  2.011617
## prolongued_decelerations_norm          2.021322539 2.096183e+01  3.446026
## abnormal_short_term_variability_norm   6.782333849 1.952482e+01  1.760197
## mean_value_of_st_var_norm             18.231728048 2.362290e-01  1.423859
## perc_of_time_w_abn_long_term_var_norm 12.940053570 3.640952e+00  2.139699
## mean_value_of_lt_var_norm              0.002657663 2.590346e+01  2.135923
## histogram_width_norm                  17.997035182 4.090731e-01  3.688097
## histogram_max_norm                     8.414556761 2.032200e+00 16.502580
## histogram_number_piks_norm            11.795246186 1.829371e+00  5.252262
## histogram_number_zeros_norm            2.835652294 1.054048e-04  1.839416
## histogram_tendency_norm                0.044251060 4.074964e+00 15.677421
##                                              Dim.4        Dim.5        Dim.6
## baseline_value_norm                    0.524394820 8.592464e+00 1.297845e-03
## accelerations_norm                    27.026729695 7.865195e+00 8.181894e-01
## fetal_movement_norm                   10.384928452 3.095615e+01 5.425782e-01
## uterine_contractions_norm              1.983605485 2.253980e+01 4.119085e+00
## light_decelerations_norm              13.502327025 8.462885e-01 1.262634e+00
## severe_decelerations_norm              2.070396591 5.447026e-04 7.384799e+01
## prolongued_decelerations_norm          0.483312681 5.283718e+00 3.615469e+00
## abnormal_short_term_variability_norm   0.003193994 5.188202e-01 1.726015e-01
## mean_value_of_st_var_norm              0.322436055 7.354417e-02 2.417531e-01
## perc_of_time_w_abn_long_term_var_norm  2.428600697 1.902246e-02 3.016388e-04
## mean_value_of_lt_var_norm              2.753239235 1.480410e+01 1.506151e+00
## histogram_width_norm                   0.218625646 5.840983e-01 1.803279e-01
## histogram_max_norm                     4.443796479 5.314463e+00 2.877629e+00
## histogram_number_piks_norm             1.157970271 2.174607e+00 2.019101e-02
## histogram_number_zeros_norm           20.874057016 3.514262e-01 2.528479e+00
## histogram_tendency_norm               11.822385856 7.575661e-02 8.265326e+00
##                                             Dim.7       Dim.8
## baseline_value_norm                    0.74281555 12.49669442
## accelerations_norm                     3.36059293 11.21466258
## fetal_movement_norm                    5.84531079  1.80878141
## uterine_contractions_norm              0.06340299 17.02566515
## light_decelerations_norm               4.84068452  0.05694790
## severe_decelerations_norm             12.13852459  4.36831572
## prolongued_decelerations_norm          4.37162360  4.85277759
## abnormal_short_term_variability_norm   0.21692012  0.68697830
## mean_value_of_st_var_norm              0.11535043  0.23379172
## perc_of_time_w_abn_long_term_var_norm  0.93825136  0.77711914
## mean_value_of_lt_var_norm              8.85966640 15.54558483
## histogram_width_norm                   0.00646173  0.03453765
## histogram_max_norm                     8.45922862  2.57061832
## histogram_number_piks_norm             0.19067809  0.81924091
## histogram_number_zeros_norm           14.08660192 27.45866012
## histogram_tendency_norm               35.76388634  0.04962425
fviz_screeplot(pca_output,ncp=8,addlabels=T,barfill="blue",barcolor="red")

pca_output$eig
##         eigenvalue percentage of variance cumulative percentage of variance
## comp 1   4.3537266             27.2107913                          27.21079
## comp 2   1.7542996             10.9643723                          38.17516
## comp 3   1.6908935             10.5680847                          48.74325
## comp 4   1.3423103              8.3894392                          57.13269
## comp 5   1.2371720              7.7323250                          64.86501
## comp 6   1.0067872              6.2924203                          71.15743
## comp 7   0.9151083              5.7194269                          76.87686
## comp 8   0.7738885              4.8368033                          81.71366
## comp 9   0.6719093              4.1994334                          85.91310
## comp 10  0.5860738              3.6629615                          89.57606
## comp 11  0.5669214              3.5432586                          93.11932
## comp 12  0.4438006              2.7737535                          95.89307
## comp 13  0.2403512              1.5021950                          97.39526
## comp 14  0.1952423              1.2202644                          98.61553
## comp 15  0.1498119              0.9363245                          99.55185
## comp 16  0.0717034              0.4481463                         100.00000
pca_output_ret <- paran(X_transs,seed=1,graph = TRUE)
## 
## Using eigendecomposition of correlation matrix.
## Computing: 10%  20%  30%  40%  50%  60%  70%  80%  90%  100%
## 
## 
## Results of Horn's Parallel Analysis for component retention
## 480 iterations, using the mean estimate
## 
## -------------------------------------------------- 
## Component   Adjusted    Unadjusted    Estimated 
##             Eigenvalue  Eigenvalue    Bias 
## -------------------------------------------------- 
## 1           4.203744    4.353726      0.149982
## 2           1.635174    1.754299      0.119125
## 3           1.594786    1.690893      0.096107
## 4           1.266518    1.342310      0.075791
## 5           1.180104    1.237172      0.057067
## -------------------------------------------------- 
## 
## Adjusted eigenvalues > 1 indicate dimensions to retain.
## (5 components retained)

pca_output_ret$Retained
## [1] 5

We should stick to the first 5 principal components (explaining almost 65% of the variability of the model). We can check what are the variables that account the most to explaining this variability.

pca_output$eig[,3][1:5]
##   comp 1   comp 2   comp 3   comp 4   comp 5 
## 27.21079 38.17516 48.74325 57.13269 64.86501
fviz_contrib(pca_output, choice="var", axes=1,top=5)

fviz_contrib(pca_output, choice="var", axes=2,top=5)

fviz_contrib(pca_output, choice="var", axes=3,top=5)

fviz_contrib(pca_output, choice="var", axes=4,top=5)

fviz_contrib(pca_output, choice="var", axes=5,top=5)

# Check if the first two components give me any sign of subgroups.
colors_X <- c(color_2,color_3)[1*(Y=="Healthy")+1]
par(mfrow=c(1,1))
pairs(X_pcs$x[,1:5],col=colors_X,pch=19,main="The first three PCs")

We can see that component number 3 and number 2 are the ones that better explain for the difference between healthy and warning fetus.

Modeling on probabilities

Now, I will perform a Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), Nayve Bayes, and Logistic Regression. All of these approaches are based in probabilities, and depend on a threshold after we predict the probabilities.

If the probability lies in an interval 0.5-1 it will be clasified as “Warning”, while if the probability lies in the interval 0-0.5 it will be classified as "Healthy.

So first, I will find each model maximizing accuracy, and discuss on the results. After that, I will consider that accuracy is not what I am chasing, but actually to minimize my cost-matrix. To do this, I will imagine this research was ordered by an organization (maybe an NGO) whose goal is to fight deaths that could be prevented. If this is the case, we may be more interested in losing some accuracy, but minimizing our cost (specially because the cost of diagnosing an unhealthy fetus as healthy is very expensive, in comparison with the other error.) This is called relative cost learning and I will apply this for the 4 models and then discuss on the result.

Chasing accuracy

Let´s try first just running the models and checking the output without the confusion matrix.

QDA

qda.class.datos <- qda(fetal_health ~ ., X_trans)
qda.class.datos
## Call:
## qda(fetal_health ~ ., data = X_trans)
## 
## Prior probabilities of groups:
##   Healthy   Warning 
## 0.7784572 0.2215428 
## 
## Group means:
##         baseline_value_norm accelerations_norm fetal_movement_norm
## Healthy           -0.135757          0.2329389         -0.04374192
## Warning            0.477023         -0.8185008          0.15370038
##         uterine_contractions_norm light_decelerations_norm
## Healthy                 0.1395107               0.01742883
## Warning                -0.4902126              -0.06124141
##         severe_decelerations_norm prolongued_decelerations_norm
## Healthy               -0.04691704                    -0.1816333
## Warning                0.16485710                     0.6382233
##         abnormal_short_term_variability_norm mean_value_of_st_var_norm
## Healthy                           -0.2659398                 0.1857047
## Warning                            0.9344594                -0.6525292
##         perc_of_time_w_abn_long_term_var_norm mean_value_of_lt_var_norm
## Healthy                            -0.2552409                0.09904268
## Warning                             0.8968656               -0.34801622
##         histogram_width_norm histogram_max_norm histogram_number_piks_norm
## Healthy            0.1048106         0.02712086                 0.04653131
## Warning           -0.3682835        -0.09529727                -0.16350172
##         histogram_number_zeros_norm histogram_tendency_norm
## Healthy                  0.01662044              0.04342063
## Warning                 -0.05840089             -0.15257144

Let´s try our LDA

lda.class.datos <- lda(fetal_health ~ ., X_trans)
lda.class.datos
## Call:
## lda(fetal_health ~ ., data = X_trans)
## 
## Prior probabilities of groups:
##   Healthy   Warning 
## 0.7784572 0.2215428 
## 
## Group means:
##         baseline_value_norm accelerations_norm fetal_movement_norm
## Healthy           -0.135757          0.2329389         -0.04374192
## Warning            0.477023         -0.8185008          0.15370038
##         uterine_contractions_norm light_decelerations_norm
## Healthy                 0.1395107               0.01742883
## Warning                -0.4902126              -0.06124141
##         severe_decelerations_norm prolongued_decelerations_norm
## Healthy               -0.04691704                    -0.1816333
## Warning                0.16485710                     0.6382233
##         abnormal_short_term_variability_norm mean_value_of_st_var_norm
## Healthy                           -0.2659398                 0.1857047
## Warning                            0.9344594                -0.6525292
##         perc_of_time_w_abn_long_term_var_norm mean_value_of_lt_var_norm
## Healthy                            -0.2552409                0.09904268
## Warning                             0.8968656               -0.34801622
##         histogram_width_norm histogram_max_norm histogram_number_piks_norm
## Healthy            0.1048106         0.02712086                 0.04653131
## Warning           -0.3682835        -0.09529727                -0.16350172
##         histogram_number_zeros_norm histogram_tendency_norm
## Healthy                  0.01662044              0.04342063
## Warning                 -0.05840089             -0.15257144
## 
## Coefficients of linear discriminants:
##                                               LD1
## baseline_value_norm                   -0.07388097
## accelerations_norm                    -0.37756476
## fetal_movement_norm                    0.01919919
## uterine_contractions_norm             -0.24588152
## light_decelerations_norm               0.15628379
## severe_decelerations_norm              0.17057022
## prolongued_decelerations_norm          0.77598935
## abnormal_short_term_variability_norm   0.44740227
## mean_value_of_st_var_norm             -0.15952718
## perc_of_time_w_abn_long_term_var_norm  0.59942242
## mean_value_of_lt_var_norm              0.04396969
## histogram_width_norm                  -0.26895514
## histogram_max_norm                     0.50389820
## histogram_number_piks_norm            -0.02005009
## histogram_number_zeros_norm            0.04025855
## histogram_tendency_norm                0.09623987
plot(lda.class.datos)

Logistic Regression

Let´s first divide our sample into a pair of sets: training sets and testing set. This is crucial to test our model, since we want to see the performance of the predictions not along the same data we used to build the model, but for unseen data that we know is part of the generator process of information. We will take 60% of the data for training, and 40% for testing. This is a first approach, as we are guessing the p=0.6, and we are not deducing it fron cross-validation.

spl = createDataPartition(X_trans$fetal_health, p = 0.8, list = FALSE)

FetusTrain = X_trans[spl,]
FetusTest = X_trans[-spl,]

table(FetusTrain$fetal_health)
## 
## Healthy Warning 
##    1324     377
table(FetusTest$fetal_health)
## 
## Healthy Warning 
##     331      94
levels(X_trans$fetal_health)
## [1] "Healthy" "Warning"
library(VGAM)

log.fit = glm (fetal_health ~ ., family=binomial(link="logit"), data=FetusTrain)

# We are not really interested in coefficients, as what we are interested in is probabilities. This is why I define an object prediction on the test set.

prob.test = predict(log.fit, newdata=FetusTest, type="response")
head(prob.test)
##          1          4          7         15         18         32 
## -0.8489228 -8.0205896  1.9895590 -4.5465278 -0.7147929 -3.3631263
# Now let´s place these into a list, but in labels

pred.test <- as.factor(levels(FetusTest$fetal_health)[max.col(prob.test)])

summary(log.fit)
## 
## Call:
## glm(formula = fetal_health ~ ., family = binomial(link = "logit"), 
##     data = FetusTrain)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3068  -0.2435  -0.0855  -0.0161   3.2663  
## 
## Coefficients:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                           -3.130624   0.193356 -16.191  < 2e-16 ***
## baseline_value_norm                   -0.058318   0.188001  -0.310  0.75641    
## accelerations_norm                    -1.748658   0.235679  -7.420 1.17e-13 ***
## fetal_movement_norm                    0.003792   0.129395   0.029  0.97662    
## uterine_contractions_norm             -0.614438   0.123108  -4.991 6.01e-07 ***
## light_decelerations_norm               0.353371   0.174664   2.023  0.04306 *  
## severe_decelerations_norm              0.215107   0.067125   3.205  0.00135 ** 
## prolongued_decelerations_norm          1.621264   0.156990  10.327  < 2e-16 ***
## abnormal_short_term_variability_norm   1.245358   0.155823   7.992 1.33e-15 ***
## mean_value_of_st_var_norm             -0.608821   0.213369  -2.853  0.00433 ** 
## perc_of_time_w_abn_long_term_var_norm  0.646350   0.146970   4.398 1.09e-05 ***
## mean_value_of_lt_var_norm              0.161473   0.177934   0.907  0.36415    
## histogram_width_norm                  -0.638153   0.240508  -2.653  0.00797 ** 
## histogram_max_norm                     1.332242   0.240150   5.548 2.90e-08 ***
## histogram_number_piks_norm             0.146322   0.152558   0.959  0.33750    
## histogram_number_zeros_norm            0.122743   0.094123   1.304  0.19221    
## histogram_tendency_norm                0.252067   0.142036   1.775  0.07595 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1799.55  on 1700  degrees of freedom
## Residual deviance:  688.04  on 1684  degrees of freedom
## AIC: 722.04
## 
## Number of Fisher Scoring iterations: 7
tab=table(prob.test>0.5,FetusTest$fetal_health)
tab
##        
##         Healthy Warning
##   FALSE     315      20
##   TRUE       16      74

What can we get from this? Well, first that not all the variables seem statistically significative in our analysis. At the same time, we can say that there are some that obivusoly have a lot to do with our problem like: baseline value, accelerations,prolongued decelerations, abnormal short term variability, histogram max (these are significative at 99% of confidence level). At the same time, the interpretation is not so straightforward, as this is not a linear problem. We can say, for instance, that if we increase the baseline value in “1” unit, the logarith of the odds would decrease by 1.8 aproximatly (if all the rest stays untouched). If we take the exponential of this value we would get the odds:

exp(-1.7993)
## [1] 0.1654146

This means if we increase the baseline value by 1 unit, we would decrease our chances of getting a “warning” fetus by 0.1654.

Now, in order to have a better look at out models, we will use CARET. CARET helps us a lot, because not only we can use almost the same structure of code, but also because we can access to a lot of things inside of it.

Let´s use Caret now.

Train Control

I will repeat 5 times a 10-fold cross-validation to search for good hiperparameters for the training and testing set. This is: I will divide the data in 10 parts, 9 for training and 1 for testing. And this will be repeated 5 times.

ctrl <- trainControl(method = "repeatedcv", 
                     repeats = 5,
                     number = 10)

Logistic Regression

lrFit <- train(fetal_health ~ ., 
               method = "glmnet",
               family = "binomial",
               data = FetusTrain,
               tuneGrid = expand.grid(alpha = seq(0, 2, 0.1), lambda = seq(0, .1, 0.01)),
               metric = "Accuracy",
               trControl = ctrl)
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
## Warning in (function (x, y, family = c("gaussian", "binomial", "poisson", :
## alpha >1; set to 1
print(lrFit)
## glmnet 
## 
## 1701 samples
##   16 predictor
##    2 classes: 'Healthy', 'Warning' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1531, 1531, 1531, 1531, 1530, 1531, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda  Accuracy   Kappa    
##   0.0    0.00    0.9027687  0.7002332
##   0.0    0.01    0.9027687  0.7002332
##   0.0    0.02    0.9027687  0.7002332
##   0.0    0.03    0.9018323  0.6943994
##   0.0    0.04    0.9013617  0.6895479
##   0.0    0.05    0.8993659  0.6795598
##   0.0    0.06    0.8970170  0.6697889
##   0.0    0.07    0.8953693  0.6619510
##   0.0    0.08    0.8956053  0.6609502
##   0.0    0.09    0.8941907  0.6548070
##   0.0    0.10    0.8932516  0.6497366
##   0.1    0.00    0.9019535  0.7091795
##   0.1    0.01    0.9042961  0.7108125
##   0.1    0.02    0.9031217  0.7017375
##   0.1    0.03    0.9037140  0.6994385
##   0.1    0.04    0.9019507  0.6909376
##   0.1    0.05    0.8992489  0.6793672
##   0.1    0.06    0.8968987  0.6688783
##   0.1    0.07    0.8965430  0.6654162
##   0.1    0.08    0.8963077  0.6616518
##   0.1    0.09    0.8954848  0.6568759
##   0.1    0.10    0.8936073  0.6470499
##   0.2    0.00    0.9020704  0.7095969
##   0.2    0.01    0.9042975  0.7104338
##   0.2    0.02    0.9035978  0.7031078
##   0.2    0.03    0.9039528  0.7002566
##   0.2    0.04    0.9013638  0.6893074
##   0.2    0.05    0.8997188  0.6808417
##   0.2    0.06    0.8981922  0.6726663
##   0.2    0.07    0.8960800  0.6616337
##   0.2    0.08    0.8939630  0.6518913
##   0.2    0.09    0.8904294  0.6354070
##   0.2    0.10    0.8885436  0.6247200
##   0.3    0.00    0.9020704  0.7095969
##   0.3    0.01    0.9039466  0.7093201
##   0.3    0.02    0.9041881  0.7046042
##   0.3    0.03    0.9030116  0.6979258
##   0.3    0.04    0.9013659  0.6886981
##   0.3    0.05    0.8996019  0.6791405
##   0.3    0.06    0.8953686  0.6596953
##   0.3    0.07    0.8919623  0.6450420
##   0.3    0.08    0.8901934  0.6343273
##   0.3    0.09    0.8892523  0.6278497
##   0.3    0.10    0.8845477  0.6071999
##   0.4    0.00    0.9023050  0.7103593
##   0.4    0.01    0.9043003  0.7102521
##   0.4    0.02    0.9034857  0.7030877
##   0.4    0.03    0.9026628  0.6959440
##   0.4    0.04    0.8999555  0.6823326
##   0.4    0.05    0.8964288  0.6659377
##   0.4    0.06    0.8921955  0.6473214
##   0.4    0.07    0.8904281  0.6365583
##   0.4    0.08    0.8887817  0.6277241
##   0.4    0.09    0.8846605  0.6076780
##   0.4    0.10    0.8797317  0.5832499
##   0.5    0.00    0.9024220  0.7107597
##   0.5    0.01    0.9044172  0.7104104
##   0.5    0.02    0.9038352  0.7037496
##   0.5    0.03    0.9019590  0.6927711
##   0.5    0.04    0.8988974  0.6773249
##   0.5    0.05    0.8943063  0.6570831
##   0.5    0.06    0.8910163  0.6406343
##   0.5    0.07    0.8885422  0.6282954
##   0.5    0.08    0.8841878  0.6070087
##   0.5    0.09    0.8781967  0.5794174
##   0.5    0.10    0.8698561  0.5398882
##   0.6    0.00    0.9023036  0.7104730
##   0.6    0.01    0.9041833  0.7099289
##   0.6    0.02    0.9025410  0.6991622
##   0.6    0.03    0.8993714  0.6830886
##   0.6    0.04    0.8963063  0.6669548
##   0.6    0.05    0.8930094  0.6510749
##   0.6    0.06    0.8900737  0.6360604
##   0.6    0.07    0.8865422  0.6180927
##   0.6    0.08    0.8774943  0.5791677
##   0.6    0.09    0.8689149  0.5383804
##   0.6    0.10    0.8596331  0.4934755
##   0.7    0.00    0.9023036  0.7104730
##   0.7    0.01    0.9028912  0.7061587
##   0.7    0.02    0.9026614  0.6990488
##   0.7    0.03    0.8984268  0.6801758
##   0.7    0.04    0.8948952  0.6619065
##   0.7    0.05    0.8927748  0.6490714
##   0.7    0.06    0.8905395  0.6356636
##   0.7    0.07    0.8817200  0.5983341
##   0.7    0.08    0.8716160  0.5519663
##   0.7    0.09    0.8604566  0.4996763
##   0.7    0.10    0.8498759  0.4470472
##   0.8    0.00    0.9023036  0.7104730
##   0.8    0.01    0.9027757  0.7061534
##   0.8    0.02    0.9030144  0.7001653
##   0.8    0.03    0.8952482  0.6693717
##   0.8    0.04    0.8936011  0.6565372
##   0.8    0.05    0.8920675  0.6445889
##   0.8    0.06    0.8867796  0.6204334
##   0.8    0.07    0.8761960  0.5748795
##   0.8    0.08    0.8647992  0.5215140
##   0.8    0.09    0.8515202  0.4557584
##   0.8    0.10    0.8400044  0.3946768
##   0.9    0.00    0.9023036  0.7104730
##   0.9    0.01    0.9033625  0.7078255
##   0.9    0.02    0.9005444  0.6924654
##   0.9    0.03    0.8932516  0.6622416
##   0.9    0.04    0.8933637  0.6554107
##   0.9    0.05    0.8910073  0.6395601
##   0.9    0.06    0.8807816  0.5970709
##   0.9    0.07    0.8703163  0.5480860
##   0.9    0.08    0.8564572  0.4825049
##   0.9    0.09    0.8443533  0.4186734
##   0.9    0.10    0.8311807  0.3441856
##   1.0    0.00    0.9023036  0.7104730
##   1.0    0.01    0.9041868  0.7102631
##   1.0    0.02    0.8981915  0.6847333
##   1.0    0.03    0.8934855  0.6625510
##   1.0    0.04    0.8927741  0.6527032
##   1.0    0.05    0.8877111  0.6271507
##   1.0    0.06    0.8786639  0.5869390
##   1.0    0.07    0.8651487  0.5242996
##   1.0    0.08    0.8495208  0.4471913
##   1.0    0.09    0.8341275  0.3640001
##   1.0    0.10    0.8220103  0.2891412
##   1.1    0.00    0.9023036  0.7104730
##   1.1    0.01    0.9041868  0.7102631
##   1.1    0.02    0.8981915  0.6847333
##   1.1    0.03    0.8934855  0.6625510
##   1.1    0.04    0.8927741  0.6527032
##   1.1    0.05    0.8877111  0.6271507
##   1.1    0.06    0.8786639  0.5869390
##   1.1    0.07    0.8651487  0.5242996
##   1.1    0.08    0.8495208  0.4471913
##   1.1    0.09    0.8341275  0.3640001
##   1.1    0.10    0.8220103  0.2891412
##   1.2    0.00    0.9023036  0.7104730
##   1.2    0.01    0.9041868  0.7102631
##   1.2    0.02    0.8981915  0.6847333
##   1.2    0.03    0.8934855  0.6625510
##   1.2    0.04    0.8927741  0.6527032
##   1.2    0.05    0.8877111  0.6271507
##   1.2    0.06    0.8786639  0.5869390
##   1.2    0.07    0.8651487  0.5242996
##   1.2    0.08    0.8495208  0.4471913
##   1.2    0.09    0.8341275  0.3640001
##   1.2    0.10    0.8220103  0.2891412
##   1.3    0.00    0.9023036  0.7104730
##   1.3    0.01    0.9041868  0.7102631
##   1.3    0.02    0.8981915  0.6847333
##   1.3    0.03    0.8934855  0.6625510
##   1.3    0.04    0.8927741  0.6527032
##   1.3    0.05    0.8877111  0.6271507
##   1.3    0.06    0.8786639  0.5869390
##   1.3    0.07    0.8651487  0.5242996
##   1.3    0.08    0.8495208  0.4471913
##   1.3    0.09    0.8341275  0.3640001
##   1.3    0.10    0.8220103  0.2891412
##   1.4    0.00    0.9023036  0.7104730
##   1.4    0.01    0.9041868  0.7102631
##   1.4    0.02    0.8981915  0.6847333
##   1.4    0.03    0.8934855  0.6625510
##   1.4    0.04    0.8927741  0.6527032
##   1.4    0.05    0.8877111  0.6271507
##   1.4    0.06    0.8786639  0.5869390
##   1.4    0.07    0.8651487  0.5242996
##   1.4    0.08    0.8495208  0.4471913
##   1.4    0.09    0.8341275  0.3640001
##   1.4    0.10    0.8220103  0.2891412
##   1.5    0.00    0.9023036  0.7104730
##   1.5    0.01    0.9041868  0.7102631
##   1.5    0.02    0.8981915  0.6847333
##   1.5    0.03    0.8934855  0.6625510
##   1.5    0.04    0.8927741  0.6527032
##   1.5    0.05    0.8877111  0.6271507
##   1.5    0.06    0.8786639  0.5869390
##   1.5    0.07    0.8651487  0.5242996
##   1.5    0.08    0.8495208  0.4471913
##   1.5    0.09    0.8341275  0.3640001
##   1.5    0.10    0.8220103  0.2891412
##   1.6    0.00    0.9023036  0.7104730
##   1.6    0.01    0.9041868  0.7102631
##   1.6    0.02    0.8981915  0.6847333
##   1.6    0.03    0.8934855  0.6625510
##   1.6    0.04    0.8927741  0.6527032
##   1.6    0.05    0.8877111  0.6271507
##   1.6    0.06    0.8786639  0.5869390
##   1.6    0.07    0.8651487  0.5242996
##   1.6    0.08    0.8495208  0.4471913
##   1.6    0.09    0.8341275  0.3640001
##   1.6    0.10    0.8220103  0.2891412
##   1.7    0.00    0.9023036  0.7104730
##   1.7    0.01    0.9041868  0.7102631
##   1.7    0.02    0.8981915  0.6847333
##   1.7    0.03    0.8934855  0.6625510
##   1.7    0.04    0.8927741  0.6527032
##   1.7    0.05    0.8877111  0.6271507
##   1.7    0.06    0.8786639  0.5869390
##   1.7    0.07    0.8651487  0.5242996
##   1.7    0.08    0.8495208  0.4471913
##   1.7    0.09    0.8341275  0.3640001
##   1.7    0.10    0.8220103  0.2891412
##   1.8    0.00    0.9023036  0.7104730
##   1.8    0.01    0.9041868  0.7102631
##   1.8    0.02    0.8981915  0.6847333
##   1.8    0.03    0.8934855  0.6625510
##   1.8    0.04    0.8927741  0.6527032
##   1.8    0.05    0.8877111  0.6271507
##   1.8    0.06    0.8786639  0.5869390
##   1.8    0.07    0.8651487  0.5242996
##   1.8    0.08    0.8495208  0.4471913
##   1.8    0.09    0.8341275  0.3640001
##   1.8    0.10    0.8220103  0.2891412
##   1.9    0.00    0.9023036  0.7104730
##   1.9    0.01    0.9041868  0.7102631
##   1.9    0.02    0.8981915  0.6847333
##   1.9    0.03    0.8934855  0.6625510
##   1.9    0.04    0.8927741  0.6527032
##   1.9    0.05    0.8877111  0.6271507
##   1.9    0.06    0.8786639  0.5869390
##   1.9    0.07    0.8651487  0.5242996
##   1.9    0.08    0.8495208  0.4471913
##   1.9    0.09    0.8341275  0.3640001
##   1.9    0.10    0.8220103  0.2891412
##   2.0    0.00    0.9023036  0.7104730
##   2.0    0.01    0.9041868  0.7102631
##   2.0    0.02    0.8981915  0.6847333
##   2.0    0.03    0.8934855  0.6625510
##   2.0    0.04    0.8927741  0.6527032
##   2.0    0.05    0.8877111  0.6271507
##   2.0    0.06    0.8786639  0.5869390
##   2.0    0.07    0.8651487  0.5242996
##   2.0    0.08    0.8495208  0.4471913
##   2.0    0.09    0.8341275  0.3640001
##   2.0    0.10    0.8220103  0.2891412
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.5 and lambda = 0.01.

We have two hiperparamiters: one controlls the lasue, and the other one the ridge. For each combination of hiperparamiters, the computer is repeating it 5 times and computing the accuracy.

lrPred = predict(lrFit, FetusTest)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     309      14
##    Warning      22      80
##                                         
##                Accuracy : 0.9153        
##                  95% CI : (0.8847, 0.94)
##     No Information Rate : 0.7788        
##     P-Value [Acc > NIR] : 5.46e-14      
##                                         
##                   Kappa : 0.7614        
##                                         
##  Mcnemar's Test P-Value : 0.2433        
##                                         
##             Sensitivity : 0.9335        
##             Specificity : 0.8511        
##          Pos Pred Value : 0.9567        
##          Neg Pred Value : 0.7843        
##              Prevalence : 0.7788        
##          Detection Rate : 0.7271        
##    Detection Prevalence : 0.7600        
##       Balanced Accuracy : 0.8923        
##                                         
##        'Positive' Class : Healthy       
## 

Several things to point out.

  1. The accuracy here is 0.9082, and the kappa is 0.7263. These look like pretty good results overall. It is highly probable we do get a better result with logistic regression than the beyesian classificators, as all of them assume that our data comes from a multinormal. We know that some of our data works as a categorical, ordinal variable. This is going to give us not very good results with the bayesian classificators, as Logistic regression does not assume any prior probability.

  2. Specificity is 0.7553. This is important, as is the error that worries me the most. As one would imagine, it would be much worse to diagnose an unhelathy fetus as healthy, than the opposite. So we are much more interested on reducing this error, than to diagnose a healthy baby as unhealthy. Nevertheless, we still have to be causious about the second error, as we do care if we diagnose a fetus as unhealthy, even though it is realy healthy. This is because this error can cause a lot of consequences: the mother may want an abortion, it may cause depression, and more. Nevertheless sensitivity is more than 0.95, that is great.

  3. I am, indeed, willing to lose some accuracy in favor of gaining some more precision in not comitting this error that worries me so much. In this case, out of the 425 fetus of the testing set, 23 of them would probably die and maybe kill their mothers because I diagnosed them as healthy. Even if 5% does not sound like a big mistake, if I consider this as life that could be saved, I would be willing to change my goal and chase a smaller worst mistake. I will discuss this later.

Let´s plot the importance of each variable.

lr_imp <- varImp(lrFit, scale = F)
plot(lr_imp, scales = list(y = list(cex = .95)))

We can see that the most important variable here is the prolongued decelerations, followed by abnormal short term variability.

Let´s check the ROC Curve. This is a very useful tool, as it allows us to evaluate all possible thresholds for splitting predicted probabilities into predicted classes.

lrPred <- predict(lrFit,FetusTest,type="prob")
colAUC(lrPred, FetusTest[["fetal_health"]], plotROC = TRUE)

##                       Healthy   Warning
## Healthy vs. Warning 0.9635212 0.9635212

As we can see, this summarizes the performance across all thresholds. When the area under the curve is 0.5 it referes to a random guess (it´s like having a kappa equal to zero). If the AUC is 1, we are facing a model that always predicts right. As we can see now, the area under the curve here is almost 0.96. This is a very good indication of the performance of our model.

LDA

Now, it´s time to try and fit a model with linear discriminany analysis. I will try this with 3 differents methods (or techinques): sparseLDA, regular LDA and StepLDA. I will choose the one with the best performance, but also taking into account if there are big differenecs in the error proportions (shown in the confusion matrix).

First, we will perform LDA Sparse. We can tune the hyperparameters and we can even check the combinations of predictos-parameters, to make a good decision.

LDAGrid <-  expand.grid(.NumVars = c(2:15), 
                        .lambda = c(0, 0.01, 0.1, 1, 10, 100))

ldasparseFit <- train(fetal_health ~ ., 
                method = "sparseLDA", 
                data = FetusTrain,
                tuneGrid = LDAGrid,
                metric = "Accuracy",
                trControl = ctrl)

ldaPred = predict(ldasparseFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     301      15
##    Warning      30      79
##                                           
##                Accuracy : 0.8941          
##                  95% CI : (0.8609, 0.9217)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 4.358e-10       
##                                           
##                   Kappa : 0.7093          
##                                           
##  Mcnemar's Test P-Value : 0.03689         
##                                           
##             Sensitivity : 0.9094          
##             Specificity : 0.8404          
##          Pos Pred Value : 0.9525          
##          Neg Pred Value : 0.7248          
##              Prevalence : 0.7788          
##          Detection Rate : 0.7082          
##    Detection Prevalence : 0.7435          
##       Balanced Accuracy : 0.8749          
##                                           
##        'Positive' Class : Healthy         
## 
plot(ldasparseFit)

ldasparseFit$bestTune 
##    NumVars lambda
## 66      12    100

Here we can see that our best desition would be to leave with 15 predictors and a lambda=100. Nevertheless, we want to compare models, and would like to stay with the same amount of predictors for all of them (16). At the same time, we see that we have a lot of accuracy when we take a few variables (seems like two predictors is a good idea for any value of lambda).

Again, CARET will find the hyperparameters to maximize our accuracy.

ldasparseFit <- train(fetal_health ~ ., 
                method = "sparseLDA", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)

ldaPred = predict(ldasparseFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     305      15
##    Warning      26      79
##                                           
##                Accuracy : 0.9035          
##                  95% CI : (0.8714, 0.9299)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 1.049e-11       
##                                           
##                   Kappa : 0.7312          
##                                           
##  Mcnemar's Test P-Value : 0.1183          
##                                           
##             Sensitivity : 0.9215          
##             Specificity : 0.8404          
##          Pos Pred Value : 0.9531          
##          Neg Pred Value : 0.7524          
##              Prevalence : 0.7788          
##          Detection Rate : 0.7176          
##    Detection Prevalence : 0.7529          
##       Balanced Accuracy : 0.8809          
##                                           
##        'Positive' Class : Healthy         
## 
ldasparseFit$bestTune 
##   NumVars lambda
## 8      16  1e-04
ldastepFit <- train(fetal_health ~ ., 
                method = "stepLDA", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82561;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.146
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82636;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.073
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83137;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.024
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82876;  in: "mean_value_of_st_var_norm";  variables (1): mean_value_of_st_var_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.077
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82755;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.07
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8268;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.056
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82624;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.046
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83072;  in: "perc_of_time_w_abn_long_term_var_norm";  variables (1): perc_of_time_w_abn_long_term_var_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.037
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82443;  in: "mean_value_of_st_var_norm";  variables (1): mean_value_of_st_var_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.045
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82637;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.354
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8268;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.04
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82952;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.095
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82614;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.041
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8281;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.096
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82758;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.029
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83032;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.166
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82757;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.395
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82505;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.351
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82311;  in: "mean_value_of_st_var_norm";  variables (1): mean_value_of_st_var_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.447
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82941;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.044
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82614;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.077
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83085;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.02
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82637;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.039
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82626;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.077
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82484;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.026
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82549;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.056
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83029;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.05
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82624;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.096
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82431;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.035
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82953;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.003
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82638;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.055
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82953;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.127
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82952;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.243
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82876;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.438
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82832;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##         0.0         0.0         2.1
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82559;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.147
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82549;  in: "mean_value_of_st_var_norm";  variables (1): mean_value_of_st_var_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.402
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82745;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.188
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8281;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.116
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82576;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.083
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.83017;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.02
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82822;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.062
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.8281;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.031
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82756;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.138
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82431;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.128
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1531 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82757;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.09
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82941;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.059
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82506;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.09
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1530 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82876;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.146
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1532 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82442;  in: "mean_value_of_st_var_norm";  variables (1): mean_value_of_st_var_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.203
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 1701 observations of 16 variables in 2 classes; direction: both
## stop criterion: improvement less than 5%.
## correctness rate: 0.82718;  in: "abnormal_short_term_variability_norm";  variables (1): abnormal_short_term_variability_norm 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##       0.000       0.000       2.365
ldaPred = predict(ldastepFit, FetusTest)

confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     309      53
##    Warning      22      41
##                                           
##                Accuracy : 0.8235          
##                  95% CI : (0.7839, 0.8586)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 0.013661        
##                                           
##                   Kappa : 0.4192          
##                                           
##  Mcnemar's Test P-Value : 0.000532        
##                                           
##             Sensitivity : 0.9335          
##             Specificity : 0.4362          
##          Pos Pred Value : 0.8536          
##          Neg Pred Value : 0.6508          
##              Prevalence : 0.7788          
##          Detection Rate : 0.7271          
##    Detection Prevalence : 0.8518          
##       Balanced Accuracy : 0.6849          
##                                           
##        'Positive' Class : Healthy         
## 
ldaFit <- train(fetal_health ~ ., 
                method = "lda", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
ldaPred = predict(ldaFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     305      14
##    Warning      26      80
##                                          
##                Accuracy : 0.9059         
##                  95% CI : (0.874, 0.9319)
##     No Information Rate : 0.7788         
##     P-Value [Acc > NIR] : 3.872e-12      
##                                          
##                   Kappa : 0.7388         
##                                          
##  Mcnemar's Test P-Value : 0.08199        
##                                          
##             Sensitivity : 0.9215         
##             Specificity : 0.8511         
##          Pos Pred Value : 0.9561         
##          Neg Pred Value : 0.7547         
##              Prevalence : 0.7788         
##          Detection Rate : 0.7176         
##    Detection Prevalence : 0.7506         
##       Balanced Accuracy : 0.8863         
##                                          
##        'Positive' Class : Healthy        
## 

Let´s plot each ROC plot and asses there Area under the Curve.

ldasparsePred <- predict(ldasparseFit,FetusTest,type="prob")
colAUC(ldasparsePred, FetusTest[["fetal_health"]], plotROC = TRUE)

##                       Healthy   Warning
## Healthy vs. Warning 0.9623321 0.9623321
ldastepPred <- predict(ldastepFit,FetusTest,type="prob")
colAUC(ldastepPred, FetusTest[["fetal_health"]], plotROC = TRUE)

##                       Healthy   Warning
## Healthy vs. Warning 0.8464518 0.8464518
ldaPred <- predict(ldaFit,FetusTest,type="prob")
colAUC(ldaPred, FetusTest[["fetal_health"]], plotROC = TRUE)

##                       Healthy   Warning
## Healthy vs. Warning 0.9623321 0.9623321

Betweem the three models, i will stick to the third one (method=lda), as its area under the curve is 0.95 (in comparison with 0.87 and 0.83 in the other two models). At the same time, we get the best accuracy and kappa (accuracy = 0.9012, kappa = 0.7196). Also, although the lda model does not have the biggest sensitivity of all (as the step method has a bigger value), it has the highest score of specificity (and this is the error we care the most). So we will stick to the third model.

QDA

qdaFit <- train(fetal_health ~ ., 
                method = "qda", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
## Warning: model fit failed for Fold08.Rep1: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold06.Rep2: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold04.Rep3: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold03.Rep4: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning: model fit failed for Fold03.Rep5: parameter=none Error in qda.default(x, grouping, ...) : rank deficiency in group Healthy
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
qdaPred = predict(qdaFit, FetusTest)
confusionMatrix(qdaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     315      21
##    Warning      16      73
##                                         
##                Accuracy : 0.9129        
##                  95% CI : (0.882, 0.938)
##     No Information Rate : 0.7788        
##     P-Value [Acc > NIR] : 1.654e-13     
##                                         
##                   Kappa : 0.7424        
##                                         
##  Mcnemar's Test P-Value : 0.5108        
##                                         
##             Sensitivity : 0.9517        
##             Specificity : 0.7766        
##          Pos Pred Value : 0.9375        
##          Neg Pred Value : 0.8202        
##              Prevalence : 0.7788        
##          Detection Rate : 0.7412        
##    Detection Prevalence : 0.7906        
##       Balanced Accuracy : 0.8641        
##                                         
##        'Positive' Class : Healthy       
## 

When we run the QDA Model it comes with a problem (an error). This is because we have very unbalanced classes and we have some variables that almost does not change. This is interesting, but yet problematic. I knew this could become a problem, as there are a couple of variables with almost non variability. It should not be a surprise when we are trying to fit a quadratic model, in data that we can picture as a line. In detail, what is happening is that in one of the folds, there is a group (“Healthy”) that presents a variable with no variation, and thus it becomes impossible to invert the correlation matrix (singularity problem). The fact that the “problematic” group is the healthy one makes sense, as there are some variables in which most of the data is equal to zero for the healthy babies (like “severe decelerations”).

All in all, I will keep the results of this model, because out from this issue, it predicts fairly good. The accuracy is 0.9012 and the kappa is 0.697. It has a huge sensitivity value (0.9577) and an “acceptable” (0.7021) specificity value.

Naive Bayes

naiveFit <- train(fetal_health ~ ., 
                method = "naive_bayes", 
                data = FetusTrain,
                metric = "Accuracy")
ldaPred = predict(naiveFit, FetusTest)
confusionMatrix(ldaPred,FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     303      11
##    Warning      28      83
##                                           
##                Accuracy : 0.9082          
##                  95% CI : (0.8767, 0.9339)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 1.392e-12       
##                                           
##                   Kappa : 0.7498          
##                                           
##  Mcnemar's Test P-Value : 0.01041         
##                                           
##             Sensitivity : 0.9154          
##             Specificity : 0.8830          
##          Pos Pred Value : 0.9650          
##          Neg Pred Value : 0.7477          
##              Prevalence : 0.7788          
##          Detection Rate : 0.7129          
##    Detection Prevalence : 0.7388          
##       Balanced Accuracy : 0.8992          
##                                           
##        'Positive' Class : Healthy         
## 

Naive Bayes is a faily good model too, as it performs pretty alright in predicting. Accuracy is 0.8824 and and kappa is 0.6661. Our less worring mistake in predicting meassued by sensivility is 0.9154 and specificity is 0.7660. We still make a lot of expensive mistakes.

In general, if we are worried about accuracy of predictions, then we would stick (from the models performed above) with our logistic regression model, as it has the higest accuracy and kappa values. Although our lda model is also very good, almost predicts as good as our logistic model. Nevertheless, I was expecting this, as one good thing about logistic regression model is that it does not assume any prior probabilistic distribution for the variables, while the bayes classificators are based on assuming normality. Even though I transformed all my variables to make them behave “closer” to a gaussian, we know that there are some that have still a lot of skewness (are not simetrical). Also, we have some variables that don´t have many different values and this is not a problem with logistic regression.

Cost Sensitive Analysis

Let´s change the theshold, because as we discussed before, I want to put more weight on the mistake of diagnosing as a helthy fetus, one that is actually sick. So by changing the threshold from 0.5 to 0.4, for example, we will increase our cheap mistake (diagnosing unhelathy a healthy fetus) and decrease our expensive mistake (diagnosing healthy an unhealthy fetus).So we will increase one, in exchange of decreasing the other one we will be loosing accuracy in our prediction. From 0.5 to 0.4, the fetus is going to be diagnosed as “warning” if the probability is bigger than 40%. We know that any move from 50%-50% of the threshold will make us lose accuracy, as that is the threshold that maximizes accuracy.

Until now, we had left CARET estimate a good value for the threshold, now I will estimate the best value of it, taking into account the weight of each cost. To do this I proceed to build a relative cost matrix.

Cost Matrix

Prediction/Reference Healthy Warning
Healthy 0 2
Warning 0.5 1
relative.cost <- c(0, 0.5, 2, 1)

The biggest cost is determined with a weight of double the other mistake (as it is much more important not to make the worst mistake in comparison to the “not so bad” mistake). We have to be more sure that a fetus is healthy before classifying it as “healthy”.

Let´s now calculate a good threshold for our models, but optimizing this new decision rule based on the weight of each cost.

LDA

cost.i = matrix(NA, nrow = 30, ncol = 9)
# 30 replicates for training/testing sets for each of the 10 values of threshold

j <- 0
ctrl <- trainControl(method = "none")

for (threshold in seq(0.25,0.65,0.05)){

  j <- j + 1
  cat(j)
  for(i in 1:30){
    
    # partition data intro training (80%) and testing sets (20%)
    d <- createDataPartition(FetusTrain$fetal_health, p = 0.8, list = FALSE)
    # select training sample
    
    train<-FetusTrain[d,]
    test <-FetusTrain[-d,]  

    
    ldaFit <- train(fetal_health ~ ., 
                    method = "lda", 
                    data = train,
                    metric = "Accuracy",
                    trControl = ctrl)
    lrProb = predict(ldaFit, test, type="prob")
    
    lrPred = rep("Healthy", nrow(test))
    lrPred[which(lrProb[,2] > threshold)] = "Warning"
    lrPred = factor(lrPred)
    
    CM = confusionMatrix(lrPred, test$fetal_health)$table
    
    cost.i[i,j] <- sum(relative.cost*CM)/nrow(test) # unitary cost
    
  }
}
## 123456789

Now, I present a boxplot to check the distribution of each threshold value according to the cost the produce. This will give me the possibility of chosing the threshold that minimizes my cost (multiplying the cost matrix with the confusion matrix).

boxplot(cost.i, main = "Hyper-parameter selection",
        ylab = "cost",
        xlab = "threshold value",names = seq(0.25,0.65,0.05),col="royalblue2")

Instead of maximizing the accuracy we are minimizing the cost. And we can see in the boxplot that the threshold value that corresponds to 0.3 looks very low. Let´s try with this threshold.

# optimal threshold
threshold = 0.3

# final prediction
ldaFit <- train(fetal_health ~ ., 
                #method = "lda", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
ldaPred = predict(ldaFit, FetusTest)

lrProb = predict(ldaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     302       3
##    Warning      29      91
##                                           
##                Accuracy : 0.9247          
##                  95% CI : (0.8954, 0.9479)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 4.783e-16       
##                                           
##                   Kappa : 0.8011          
##                                           
##  Mcnemar's Test P-Value : 9.897e-06       
##                                           
##             Sensitivity : 0.9124          
##             Specificity : 0.9681          
##          Pos Pred Value : 0.9902          
##          Neg Pred Value : 0.7583          
##              Prevalence : 0.7788          
##          Detection Rate : 0.7106          
##    Detection Prevalence : 0.7176          
##       Balanced Accuracy : 0.9402          
##                                           
##        'Positive' Class : Healthy         
## 
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2623529

As we can see, now that we use a threshold of 0.3, we get: our specificity is now 0.9255 and we make less “worst mistakes” than before. Indeed, out of 425 fetus of the testing set, we only diagnose as healthy 7 fetus that are really un healthy. Our relative cost is of 0.267.

Naive Bayes

cost.i = matrix(NA, nrow = 30, ncol = 9)
# 30 replicates for training/testing sets for each of the 10 values of threshold

j <- 0
ctrl <- trainControl(method = "none")

for (threshold in seq(0.25,0.65,0.05)){

  j <- j + 1
  cat(j)
  for(i in 1:30){
    
    # partition data intro training (80%) and testing sets (20%)
    d <- createDataPartition(FetusTrain$fetal_health, p = 0.8, list = FALSE)
    # select training sample
    
    train<-FetusTrain[d,]
    test <-FetusTrain[-d,]  

    
    ldaFit <- train(fetal_health ~ ., 
                    method = "naive_bayes",
                    data = train,
                    metric = "Accuracy")
    lrProb = predict(ldaFit, test, type="prob")
    
    lrPred = rep("Healthy", nrow(test))
    lrPred[which(lrProb[,2] > threshold)] = "Warning"
    lrPred = factor(lrPred)
    
    CM = confusionMatrix(lrPred, test$fetal_health)$table
    
    cost.i[i,j] <- sum(relative.cost*CM)/nrow(test) # unitary cost
    
  }
}
## 123456789

Now, I will plot the different thresholds with its corresponding hyperparameter.

boxplot(cost.i, main = "Hyper-parameter selection",
        ylab = "cost",
        xlab = "threshold value",names = seq(0.25,0.65,0.05),col="royalblue2")

It looks like 0.4 is a good threshold to minimize my cost. Let´s check the confusion matrix for our new model.

# optimal threshold
threshold = 0.4

# final prediction
naivebayFit <- train(fetal_health ~ ., 
                method = "naive_bayes", 
                data = FetusTrain,
                metric = "Accuracy")
qdaPred = predict(naivebayFit, FetusTest)

lrProb = predict(naivebayFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     300      10
##    Warning      31      84
##                                           
##                Accuracy : 0.9035          
##                  95% CI : (0.8714, 0.9299)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 1.049e-11       
##                                           
##                   Kappa : 0.7407          
##                                           
##  Mcnemar's Test P-Value : 0.001787        
##                                           
##             Sensitivity : 0.9063          
##             Specificity : 0.8936          
##          Pos Pred Value : 0.9677          
##          Neg Pred Value : 0.7304          
##              Prevalence : 0.7788          
##          Detection Rate : 0.7059          
##    Detection Prevalence : 0.7294          
##       Balanced Accuracy : 0.9000          
##                                           
##        'Positive' Class : Healthy         
## 
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2811765

We did improve our worst error in comparison to our naive bayes when we were trying to maximize accuracy. Our relative error is: 0.305 and now out of 425 we only predict 21 fetus that are unhealthy as healthy.

QDA

I can´t run the same formula for the QDA model, as there is Rank defficiancy in some cases. In this case, I will just try different values myself and check which would bring the lowest relative cost or highest specificity.

# optimal threshold
threshold = 0.3

# final prediction
qdaFit <- train(fetal_health ~ ., 
                method = "qda", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
qdaPred = predict(qdaFit, FetusTest)

qdProb = predict(qdaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(qdProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     308      17
##    Warning      23      77
##                                          
##                Accuracy : 0.9059         
##                  95% CI : (0.874, 0.9319)
##     No Information Rate : 0.7788         
##     P-Value [Acc > NIR] : 3.872e-12      
##                                          
##                   Kappa : 0.7329         
##                                          
##  Mcnemar's Test P-Value : 0.4292         
##                                          
##             Sensitivity : 0.9305         
##             Specificity : 0.8191         
##          Pos Pred Value : 0.9477         
##          Neg Pred Value : 0.7700         
##              Prevalence : 0.7788         
##          Detection Rate : 0.7247         
##    Detection Prevalence : 0.7647         
##       Balanced Accuracy : 0.8748         
##                                          
##        'Positive' Class : Healthy        
## 
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2882353

Now, let´s try with threshold: 0.4

# optimal threshold
threshold = 0.4

# final prediction
qdaFit <- train(fetal_health ~ ., 
                method = "qda", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
qdaPred = predict(qdaFit, FetusTest)

qdProb = predict(qdaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(qdProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     314      18
##    Warning      17      76
##                                          
##                Accuracy : 0.9176         
##                  95% CI : (0.8873, 0.942)
##     No Information Rate : 0.7788         
##     P-Value [Acc > NIR] : 1.75e-14       
##                                          
##                   Kappa : 0.76           
##                                          
##  Mcnemar's Test P-Value : 1              
##                                          
##             Sensitivity : 0.9486         
##             Specificity : 0.8085         
##          Pos Pred Value : 0.9458         
##          Neg Pred Value : 0.8172         
##              Prevalence : 0.7788         
##          Detection Rate : 0.7388         
##    Detection Prevalence : 0.7812         
##       Balanced Accuracy : 0.8786         
##                                          
##        'Positive' Class : Healthy        
## 
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2835294
# optimal threshold
threshold = 0.35

# final prediction
qdaFit <- train(fetal_health ~ ., 
                method = "qda", 
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
qdaPred = predict(qdaFit, FetusTest)

qdProb = predict(qdaFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(qdProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     312      17
##    Warning      19      77
##                                         
##                Accuracy : 0.9153        
##                  95% CI : (0.8847, 0.94)
##     No Information Rate : 0.7788        
##     P-Value [Acc > NIR] : 5.46e-14      
##                                         
##                   Kappa : 0.756         
##                                         
##  Mcnemar's Test P-Value : 0.8676        
##                                         
##             Sensitivity : 0.9426        
##             Specificity : 0.8191        
##          Pos Pred Value : 0.9483        
##          Neg Pred Value : 0.8021        
##              Prevalence : 0.7788        
##          Detection Rate : 0.7341        
##    Detection Prevalence : 0.7741        
##       Balanced Accuracy : 0.8809        
##                                         
##        'Positive' Class : Healthy       
## 
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2835294

Between the three models that I tried, I would keep the third one (threshold equal to 0.35), as the cost is almost minimized, we don´t lose a lot of accuracy, but most of all, we do not make so many bad mistakes.

Logistic Regression

cost.i = matrix(NA, nrow = 30, ncol = 9)
# 30 replicates for training/testing sets for each of the 10 values of threshold

j <- 0
ctrl <- trainControl(method = "none")

for (threshold in seq(0.25,0.65,0.05)){

  j <- j + 1
  cat(j)
  for(i in 1:30){
    
    # partition data intro training (80%) and testing sets (20%)
    d <- createDataPartition(FetusTrain$fetal_health, p = 0.8, list = FALSE)
    # select training sample
    
    train<-FetusTrain[d,]
    test <-FetusTrain[-d,]  

    
    lrFit <- train(fetal_health ~ ., 
                    method = "glmnet",
                    family = "binomial",
                    data = train,
                    metric = "Accuracy",
                    trControl = ctrl)
    lrProb = predict(lrFit, test, type="prob")
    
    lrPred = rep("Healthy", nrow(test))
    lrPred[which(lrProb[,2] > threshold)] = "Warning"
    lrPred = factor(lrPred)
    
    CM = confusionMatrix(lrPred, test$fetal_health)$table
    
    cost.i[i,j] <- sum(relative.cost*CM)/nrow(test) # unitary cost
    
  }
}
## 123456789
boxplot(cost.i, main = "Hyper-parameter selection",
        ylab = "cost",
        xlab = "threshold value",names = seq(0.25,0.65,0.05),col="royalblue2")

# optimal threshold
threshold = 0.3

# final prediction
lrFit <- train(fetal_health ~ ., 
                method = "glmnet",
                family = "binomial",
                data = FetusTrain,
                metric = "Accuracy",
                trControl = ctrl)
lrPred = predict(lrFit, FetusTest)

lrProb = predict(lrFit, newdata=FetusTest, type="prob")
lrPred = rep("Healthy", nrow(FetusTest))
lrPred[which(lrProb[,2] > threshold)] = "Warning"
lrPred = as.factor(lrPred)
confusionMatrix(lrPred, FetusTest$fetal_health)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Healthy Warning
##    Healthy     289       8
##    Warning      42      86
##                                           
##                Accuracy : 0.8824          
##                  95% CI : (0.8479, 0.9114)
##     No Information Rate : 0.7788          
##     P-Value [Acc > NIR] : 2.672e-08       
##                                           
##                   Kappa : 0.6977          
##                                           
##  Mcnemar's Test P-Value : 3.058e-06       
##                                           
##             Sensitivity : 0.8731          
##             Specificity : 0.9149          
##          Pos Pred Value : 0.9731          
##          Neg Pred Value : 0.6719          
##              Prevalence : 0.7788          
##          Detection Rate : 0.6800          
##    Detection Prevalence : 0.6988          
##       Balanced Accuracy : 0.8940          
##                                           
##        'Positive' Class : Healthy         
## 
CM = confusionMatrix(lrPred, FetusTest$fetal_health)$table
sum(relative.cost*CM)/nrow(FetusTest) # unitary cost
## [1] 0.2894118

If we are trying to minimize the total error then we would have to take a threshold of 0.3. This way, our relative error would be of just 0.2894118. We can see how we do not make a lot of expensive mistakes anymore.

All in all, if we want to get our best model (from the ones we tried), but taking into account the cost that both mistakes and successes in predicting: if we want to minimize this cost we would stick to a lda model. This is interesting, as our conclusion is different than when we were trying to maximize accuracy. Now, in order to minimize our cost we should use a LDA. With my LDA here our minimum relative error is of 0.267, in comparison to the 0.2894 that our logistic regression model produces.